In REST, how should a GET request to a findAll operation be handled when the Resources are paged?

Question

In a RESTful Service, Resources that cannot all be retrieved at once are paginated. For example:

GET /foo?page=1

The question is, how should I handle a getAll request such as:
GET /foo

Taking discoverability/HATEOAS into consideration, I see a few options:

return a 405 Method Not Allowed and include a Link header to the first page: Link=<http://localhost:8080/rest/foo?page=0>; rel=”first“
return a 400 Bad Request and include the Link header (same as above)
return a 303 See Other to the first paginated page
return a 200 OK but actually return only the first page (and include the URI of the next page into the Link): Link=<http://localhost:8080/rest/foo?page=1>; rel=”next“
- note: I would rather not do this, having learned not to manage anything for the client by default, if they haven't explicitly asked for it.

These are of course only a few options. I'm leaning towards the first, but I'm not sure if there is a best practice on this that I am not aware of. Any feedback is appreciated. Thanks.

zzzzBov · Answer 1 · 2012-02-09T16:07:58.973

Lets start with the fact that REST is not a set-in-stone protocol like SOAP, it's simply a means of structuring a service, similar to how languages are described as being Object-Oriented.

That all being said, I'd recommend the handling this as follows.

Treat a RESTful call like a function declaration.

GET /foo
foo()

Some functions require parameters.

GET /foo?start=??&count=??
foo(start, count)

Some languages support default parameters, others don't; you get to decide for yourself how you want to handle parameters.

With default parameters, you could assume that the function was defined as

foo(start = 0, count = 10)

so that a call to GET /foo would actually be equivalent to GET /foo?start=0&count=10, whereas a call to GET /foo?start=100 would be equivalent to GET /foo?start=100&count=10.

If you don't want default parameters, you could force the user of the API to explicitly set start and count:

foo(start, count)

so that a call to GET /foo would return a 400 Bad Request status code, but a call to GET /foo?start=0&count=10 would return a 200 OK status code along with the content contained by the specified range.

In either case you'll have to decide how you'll handle errors, such as

GET /foo?start=-10&count=99999999

If parameters have maximums and minimums, you'll need to decide whether to normalize the parameters, or simply return errors. The previous example might return a 400 Bad Request status code, but it could also be constrained to turn into:

GET /foo?start=0&count=1000

In the end it's up to you to decide what makes the most sense in the context of your application.

From the POV of default parameters, I would rather not do that, simply because I have learned not to `manage` anything implicitly for the client, unless they explicitly ask for it. From the POV of the HTTP status code, you're introducing a 4th option - 400 Bad Request - thanks, that's an interesting option. — Eugen, Feb 10 '12 at 09:10
@Eugen, default parameters are typical of liberal languages, while strict parameter choices are typical of more conservative languages. It doesn't matter which you choose, it's more important to be consistent. As far as the headers are concerned; part of what makes REST REST is the usage of HTTP status codes to handle the basic communications. You can still pass a message body with a `400 Bad Request`, which might include details as to *why* it was a bad request. If a client requests `GET /foo/300` and it doesn't exist, you'd send back a `404 Not Found` status code. — zzzzBov, Feb 10 '12 at 13:59
@Eugen, you might find my answer to [this question](http://stackoverflow.com/questions/4573305/rest-api-why-use-put-delete-post-get) useful. Additionally, the [HTTP Status Codes](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html) are important to know. — zzzzBov, Feb 10 '12 at 14:01

Quasdunk · Answer 2 · 2012-02-10T14:26:50.950

From a RESTful point of view, I think it perfectly alright to handle both representations the same. Consider a software with several versions you want to download, the latest one being 3.8. So if you want to get the latest version, you could address it with both GET /software/version/latest.zip and GET /software/version/3.8.zip until there comes a newer version. So two different links point to the same resource.

I like to imagine pagination pretty much the same. On the first page there are always the latest articles. So if no page-parameter is provided, you could simply imply it's 1.

The approach with the rel attribute goes in a slightly different direction. It's a creation of Google to better handle the problem with duplicate content and is primarily considered to be used in order to distinguish between a "main" page and pagination-pages. Here's how to use it:

//first page:
<link rel="next" href="http://www.foo.com/foo?page=2" />

//second page:
<link rel="prev" href="http://www.foo.com/foo?page=1" />
<link rel="next" href="http://www.foo.com/foo?page=3" />

//third and last page:
<link rel="prev" href="http://www.foo.com/foo?page=2" />

So from a SEO point of view it's a good idea (and recommended by Google) to use those elements. They also go perfectly with the resource-orientated idea of REST and the hypermedia representation of the resources.

Choosing one of your suggestions, I think the 303 See Other is the right way to go. It was intended to be used for this kind of purposes and is a good way to canonicalize your resources. You can make them available through many URIs, but have one "real" URI for a representation (like the software with different versions).

According to the specification, the response should look something like this:

303 See Other
Location: http:www.foo.com/foo?page=1

<a href="http:www.foo.com/foo?page=1">http:www.foo.com/foo?page=1</a>

So you provide a Location-header with the "real" representation, and the body should contain a hypertext document linking to the new URI. Note that according to the specification the client is expected to send a GET request to the value of Location, but it doesn't have to.

//EDIT as answer to your comment (yep, it's really bad practice to claim something without proving it :-) - my bad!):

Google presented the rel="next" and rel="prev" attributes in September 2011 on the Official Webmaster Central Blog. They can be used additionally to (or in some cases instead of) the rel="canonical" tag.

Under those links you can find the differences between them explained:

rel="next" and rel="prev" link elements are "to indicate the relationship between component URLs in a paginated series"
the rel="canonical" "allows you to publicly specify your preferred version of a URL"

So there is a slight difference between them. So you can break down your problem to a canonical issue: There are several URLs pointing to the same resource (/foo and foo?page=1 but you have a preferred version of the URL (foo?page=1). So now there are a few options for a RESTful approach:

If there is no page-parameter given in the query, use a default value (e.g. 1) when processing it. I think in this specific case it is OK to use a default value even though you point it out as bad practice.
Respond with 303 See Other providing the preferred URL in the Location-header (as described above). I think a 3xx-response is the best (and most likely RESTfully intended) way to deal with duplicate/canonical content.
Respond with 400 Bad Request in case you want to force the client to provide a page-parameter (as explained by zzzzBov in his answer). Note that this response does not have something like a Location header (as assumed in your question), so the explanation why the request failed and/or the correct URL (if given) must go to the entity-body of the response. Also, note that according to the specification this response is commonly used when the client submits a bad/malformed representation (! not URL !) along with a PUT or POST request. So keep in mind that this also might be a little ambiguous for the client.

Personally, I don't think your suggestion to respond with 405 Method Not Allowed is a good idea. According to the specification, you must provide an Allow-header listing the allowed methods. But what methods could be allowed on this resource? I can only think of POST. But if you do not want the client to POST to it either, you could also respond with 403 Forbidden with an explanation why it is forbidden, or 404 Not Found if you do not want to tell why it is forbidden. So it might be a little ambiguous, too (in my opinion).

Using link-elements with the mentioned rel-attributes as you propose in your question is not essentially 'RESTful' because it's only hypermedia which is settled in the representation of the resource. But your problem (as far as I understand it) is that you want to decide how to respond to a specific request and which representation to serve. But still it's not absolutely pointless:

You can consider the whole SEO issue as a side effect of using rel="next/prev/canonical", but keep in mind that they also create connectedness (as the quality of having links) which is one of the characteristics of REST (see Roy Fielding's dissertation).

If you want to dive into RESTful Web Services (which is totally worth it) I recommend reading the book RESTful Web Services by Leonard Richardson and Sam Ruby.

Thanks for the response. One note is that I wasn't aware that it was google that pushed the standardization of these particular rels - do you have any links to support that? Also, I wasn't considering anything related to SEO, just good practice in a RESTful web service. Thanks again for the feedback. — Eugen, Feb 10 '12 at 09:07

rogermushroom · Answer 3 · 2012-02-27T17:08:19.267

In some cases not implicitly managing anything for the client can lead to a overlay complex interface, examples would be where the consumer isn't technical or isn't intending on building on top of interface, for example in a web page. In such cases even a 200 may be appropriate.

In other cases I would agree implicit management would be a bad idea as the where a consumer would want to be able to predict the response correctly and where a simple specification may be required. In such cases 405, 400 and 303.

It's a matter of context.

In REST, how should a GET request to a findAll operation be handled when the Resources are paged?

3 Answers3