12

I am bolting a REST interface on to an existing application and I'm curious about what the most appropriate solution is to deal with resources that would return an exorbitant amount of data if they were to be retrieved.

The application is an existing timesheet system and one of the resources is a set of a user's "Time Slots". An example URI for these resources is:

/users/44/timeslots/

I have read a lot of questions that relate to how to provide the filtering for this resource to retrieve a subset and I already have a solution for that.

I want to know how (or if) I should deal with the situation that issuing a GET on the URI above would return megabytes of data from tens or hundreds of thousands of rows and would take a fair amount of server resource to actually respond in the first place.

  • Is there an HTTP response that is used by convention in these situations?
    I found HTTP code 413 which relates to a Request entity that is too large, but not one that would be appropriate for when the Response entity would be too large
  • Is there an alternative convention for limiting the response or telling the client that this is a silly request?
  • Should I simply let the server comply with this massive request?

EDIT: To be clear, I have filtering and splitting of the resource implemented and have considered pagination on other large collection resources. I want to respond appropriately to requests which don't make sense (and have obviously been requested by a client constructing a URI).

Alex Taylor
  • 1,823
  • 1
  • 13
  • 26
  • 1
    Is it actually a problem to return that much data on such a request? Or are you merely trying to safe-guard against an unintended retrieval which results in wasted server resources and bandwidth? – Jeff Stice-Hall Mar 25 '11 at 22:31
  • 1
    If the request is silly, then don't implement a handler for that resource and return a 404 if someone mistakenly constructs that URI. ...and then beat them over the head for constructing URLs :-) – Darrel Miller Mar 26 '11 at 00:01
  • @jeff-hall There isn't a problem with returning that much data, it is merely a safe-guard against unintended retrieval as you mentioned. – Alex Taylor Mar 26 '11 at 00:32

4 Answers4

14

You are free to design your URIs as you want encoding any concept.

So, depending on your users (humans/machines) you can use that as a split on a conceptual level based on your problem space or domain. As you mentioned you probably have something like this:

/users/44/timeslots/afternoon
/users/44/timeslots/offshift
/users/44/timeslots/hours/1
/users/44/timeslots/hours/1
/users/44/timeslots/UTC1624

Once can also limit by the ideas/concepts as above. You filter more by adding queries /users/44/timeslots?day=weekdays&dow=mon

Making use or concept and filters like this will naturally limit the response size. But you need to try design your API not go get into that situation. If your client misbehaves, give it a 400 Bad Request. If something goes wrong on your server side use a 5XX code.

Make use of one of the tools of REST - hypermedia and links (See also HATEOAS) Link to the next part of your hypermedia, make use of "chunk like concepts" that your domain understands (pages, time-slots). No need to download megabytes which also not good for caching which impacts scalability/speed.

Darrel Miller
  • 139,164
  • 32
  • 194
  • 243
Derick Schoonbee
  • 2,971
  • 1
  • 23
  • 39
  • 1
    HATEOS .. lol. Thanks @Darrel Miller for correcting the typo! – Derick Schoonbee Mar 25 '11 at 23:43
  • @Derick I debated actually changing it to Hypertext Constraint. The general consensus seems to be that the HATEOAS acronym is not helping anyone and "Hypertext Constraint" is the preferred term. – Darrel Miller Mar 25 '11 at 23:54
  • There are many breakdowns for the resource and constraints I apply after the `timeslot/` URI. I also have a couple of filter constraints as well. I think the piece of the concept I will take is `400 Bad Request` is the best reply for a direct request to the base resource that doesn't really make sense. – Alex Taylor Mar 26 '11 at 00:36
  • Why not just serve a bunch of links on the "base resource" with paging? What is your base resource? (/ or /timeslots or something higher up) – Derick Schoonbee Mar 26 '11 at 08:30
  • @Darrel Miller.. _Hypertext Constraint_ is not as descriptive or "cool" as _HATEOAS_ (for newbies I guess) but given the rest of the REST dissertation in that context it's more appropriate. I'll upgrade my vocabulary :) Thanks for the tip. – Derick Schoonbee Mar 26 '11 at 08:37
3

timeslots is a collection resource, why won't you simply enable pagination on that resource

see here: Pagination in a REST web application

calling get on the collection without page information simply returns the first page (with a default page size)

Should I simply let the server comply with this massive request? I think you shouldn't, but that's up to you to decide, can the server handle big volumes? do you find it a valid usecase?

Community
  • 1
  • 1
LiorH
  • 18,524
  • 17
  • 70
  • 98
  • Aha! Usecase is the magic word. The question is valid too for the pagination scenario, is that a value usecase? If not, why bother wasting time implementing it. Which goes back to the original question, why are we debating implementing an endpoint that there doesn't appear to be a usecase for and could cause problems if someone does use it. – Darrel Miller Mar 25 '11 at 23:58
  • Correct, there is no Usecase for this resource being directly requested without a filter. I simply want to return an appropriate response to that invalid request. – Alex Taylor Mar 26 '11 at 00:37
  • @Alex 404 seems the most logical to me. Returning a 400 would indicate the client that the request was malformed somehow and that if they make the request "properly" it would return a value. – Darrel Miller Mar 26 '11 at 03:10
  • @Darrel Now comparing 404 to 400, from a data sense, the resource does exist but hasn't been requested correctly. The URI is valid and would return a resource if a valid query had been provided. ie. `/user/44/timeslots?date=2011-03-26` is valid. If we ignore that the query has a meaning and take the URI to include any query as well, you're correct, the URL requested does not exist. – Alex Taylor Mar 26 '11 at 03:18
  • 1
    @Alex `http://example.org/foo` and `http://example.org/foo?q=7` are two different resources. It is a common misconception that the resource is identified by the path segment only, but that is not the case. The query string is also part of the identifier. – Darrel Miller Mar 26 '11 at 12:38
  • @Darrel Confirmed as per RFC3986, Section 3.4: _"The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority (if any)."_. – Alex Taylor Mar 26 '11 at 20:47
0

This may be too weak of an answer but here is how my team has handled it. Large resources like that are Required to have the additional filtering information provided. If the filtering information is not there to keep the size within a specific range then we return an Internal Error (500) with an appropriate message to denote that it was a failure to use the RESTful API properly.

Hope this helps.

James
  • 1,651
  • 2
  • 18
  • 24
  • 7
    HTTP 500 is not generally appropriate for an intentional error code. 500 should be used when software has unexpectedly failed or met an exception condition. Instead, it is appropriate to return HTTP 400 Bad Request. See this related StackOverflow [post](http://stackoverflow.com/questions/3290182/rest-http-status-codes/3290198#3290198). – Jeff Stice-Hall Mar 25 '11 at 22:32
0

You can use a custom Range header - see http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html

Or you can (as others have suggested) split your resource up into smaller resources at different URLs (representing sections, or pages, or otherwise filtered versions of the original resource).

mcintyre321
  • 12,996
  • 8
  • 66
  • 103