Choosing HTTP status codes Series - Part 2

Hands off that resource, HTTP status code 401 vs 403 vs 404

By Arnaud Lauret, May 5, 2021

When designing APIs, choosing HTTP status codes is not always that obvious and prone to errors, I hope this post series will help you to avoid common mistakes and choose an adapted one according to the context. This second post answers the following question: given that resource with id 123 actually exists in the underlying database, what should be the response to GET /resources/123 when consumer is not allowed to access it? 401 Unauthorized, 403 Forbidden or 404 Not Found?

Choosing HTTP status codes Series

When designing APIs, choosing HTTP status codes is not always that obvious and prone to errors, I hope this post series will help you to avoid common mistakes and choose an adapted one according to the context.<div class="alert alert-info"> I never remember in which RFCs HTTP status codes are defined. To get a quick access to their documentation, I use Erik Wilde’s Web Concepts. </div>Very special thanks to all Twitter people participating to the #choosehttpstatuscode polls and discussions

1 - This is not the HTTP method you're looking for, HTTP status code 404 vs 405 vs 501
2 - Hands off that resource, HTTP status code 401 vs 403 vs 404
3 - Move along, no resource to see here (truly), HTTP status code 204 vs 403 vs 404 vs 410
4 - Empty list, HTTP status code 200 vs 204 vs 404

The context

Let’s say you’re creating an API for a mobile application that allows people to record phone calls. Once calls are recorded, users can list them and listen to each individual recording. Listing a user’s recorded calls could be done with a GET /users/{phoneNumber}/calls, for each phone call listed you get a random and unpredictable id that can be used to retrieve the actual audio recording with a GET /calls/{callId}.

Basically it means that when a user whose phone number is 123456789 uses the mobile application, the application sends a GET /users/123456789/calls API request to list available recorded calls. The API responds with a 200 OK along with the recorded calls belonging to user. If user taps on one conversation which id is Bnwgab, the application sends a GET /calls/Bnwgab and the API responds with a 200 OK along with the audio file

But what happens if some curious and maybe malicious user scan network traffic coming out of the application? This hacker will easily understand how this “not so private” API works. With very little effort, they will succeed to generate phone numbers that actually exist in the underlying system so send GET /users/{phone number of another user}/calls requests. And with more effort, enough patience and adapted tools they may even generate valid random some callId and send GET /calls/{callId that don't belong to their user account} requests.

In either case, the API should prevent accessing resources that don’t belong to the caller and signify there’s a problem with caller’s request. Note that if that sounds like a no-brainer for many people, that is actually not always the case and some APIs may return a 200 OK along with the requested data. Regularly, stories such as this one (which inspired the above use case) come out. Never forget that when creating APIs and never refrain from double check that your colleagues are also aware of that. And note also that using PII (Personnally Identifiable Information) or other sensitive data as ids can be very convenient but raises security concerns, especially if they appear in URLs as they can be logged almost everywhere. I should write a post series about API security one day (POST /writing-ideas done!).

Let’s get back to what we are talking about today: HTTP status codes. Obviously, when consumer make an API call on resource that actually exists but don’t belong to them, the API must respond with a 4xx Client Error Class, but which one could be the more accurate?

According to my Twitter poll, 54% of people would return a 403 Forbidden, while 24% would return a 404 Not Found and also 24% would return a 401 Unauthorized. Let’s see who is right and who is wrong based on what RFCs say.

Use 404 when resource is none of consumer’s business (and never will)

The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.
RFC 7231 Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content, section 6.5.4

Returning a 404 Not Found means “the requested resource does not exists”. Actually, there’s a subtlety, it could mean “the requested resource actually exists but it’s none of your business, you’re not allowed to access it and never will; actually, it does not exist for you” or “the requested resource does not exist at all” but the consumer will never know what is the true reason behind this HTTP status code.

That response is the best one for the introduction’s use case, granted that users want to use this application without sharing anything with others. In that case, given that John and Emma use the application, if Emma “hacks” the API, we will never ever want her to know that /users/{John's phone number}/calls may exists. Because they are not supposed to know it exists and even though can’t do anything about it, so better tell her that it “doesn’t exist” (for her).

But if 404 Not Found is usually my first idea when a consumer tries to access to a /resources/1234 they shouldn’t (I admit I’m a little obsess with security and prone to not show what is not needed to be shown), there are cases where it could be interesting to let them know the target resource exists.

Use 403 when consumer can do something about it

The 403 (Forbidden) status code indicates that the server understood the request but refuses to authorize it. A server that wishes to make public why the request has been forbidden can describe that reason in the response payload (if any).
RFC 7231 Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content, section 6.5.3

Returning a 403 Forbidden signifies “the requested resource actually MAY exists but you cannot access it. You MAY access it by requesting adequate rights to someone, maybe an administrator for instance”.

Even if that is not very realistic, let’s say that the example application/API described in the introduction allows users to share recorded conversations with others. Given John has shared his conversations with Emma but not with Tara, Emma triggering a GET /users/{John's phone number}/calls API call will get a 200 OK while Tara will get a 403 Forbidden. Tara may request John the rights to access his conversations to fix that.

We have talk about 403 Forbidden and 404 Not Found, but what about the poll’s third option?

Never ever use 401 (don’t be fooled by its reason)

The 401 (Unauthorized) status code indicates that the request has not been applied because it lacks valid authentication credentials for the target resource.
RFC 7235 Hypertext Transfer Protocol (HTTP/1.1): Authentication, section 3.1

As 24% of respondents to my poll, when a consumers tries to access a resource they shouldn’t access, you may be tempted to return a 401 Unauthorized instead of a 403 Forbidden. Why would you do that? Maybe because its reason phrase says Unauthorized. But that would actually be an error, don’t be fooled by that reason phrase.

There are only two hard things in Computer Science: cache invalidation and naming things.
Phil Karlton

There’s a huge problem with 401 Unauthorized, its reason phrase let think that it is tied to “wrong authorization” while it is actually tied to “lack of authentication”. Actually the RFC that defines it is RFC 7235 - Hypertext Transfer Protocol (HTTP/1.1): Authentication… “Authentication” and not “Authorization”. Even the description states that this status is about “authentication credentials”.

A 401 signifies there’s a problem with your credentials which usually are provided in an Authorization header (still wrong name, but at least it’s consistent with the reason). This status is made to signify “you forgot to provide an Authorization header with some credentials” or “your credentials provided in the Authorization header are invalid/expired”. In the API world, it basically says “You can’t use the API at all, come back with a valid access token”.

It’s not meant to say “You can use the API but not access that resource”, that is the job of 403 Forbidden. And that is clearly stated in its description in RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content:

If authentication credentials were provided in the request, the server considers them insufficient to grant access. The client SHOULD NOT automatically repeat the request with the same credentials. The client MAY repeat the request with new or different credentials. However, a request might be forbidden for reasons unrelated to the credentials.

To be even more sure that 403 Forbidden is the right answer, let’s talk about Oauth 2 scopes. Indeed, dealing with resource rights access is not always, let’s say “internal business rule” driven (checking in users table that the identified user has the requested phone number for example). When consumers request an access token using the Oauth 2 framework (the token that goes into the not so well named Authorization header), they may request a token restricted to given elements thanks to scopes. For instance, when using the Github API, you may request access to public repo only or to user data only. What should happen when a consumer requests access to a resource without adapted scopes? Section 3.1 of RFC 6750 The OAuth 2.0 Authorization Framework: Bearer Token Usage is quite clear:

The request requires higher privileges than provided by the access token. The resource server SHOULD respond with the HTTP 403 (Forbidden) status code and MAY include the "scope" attribute with the scope necessary to access the protected resource.
insufficient_scope error

HTTP Status code is not enough

That means two things. First 401 Unauthorized is definitely not an option in the case we are studying today. Second, HTTP status code is not enough. Indeed, 403 Forbidden could be returned because consumer lacks some scope to GET /resources/{resourceId} in general or does not comply to some business rule and cannot GET /resource/1234 (a specific id). Providing a message and maybe some structured data to explained the why of the error and how it can be solved (request access token with scope X in first case or contact some admin in second case) is mandatory. Note that, this made me realized that 403 Forbidden does not actually disclose that a resource exists, it totally depends on what is said beyond the HTTP status code.

Don’t forget DX and context

Respecting HTTP and other RFCs is important to avoid surprising developers with behaviors that are against common practices, but most important, whatever the HTTP status code you’ll choose to return, what matters above all is providing the response the most adapted for the context that will actually help the developer (and the consumer and even the end user) to know what is actually happening and help them solve the problem if they can.

So, when consumers want to access a resource they shouldn’t, don’t return a 401 Unauthorized, you would go against the HTTP protocol. Instead, return a 404 Not Found if consumers can’t do anything about it (so from their perspective, it does not exist) and return a 403 Forbidden along with a meaningful message if they can request access.