Correct API approach for long processing time

Question

I am making an HTTP web API that's mainly fed by a database. Simplified, the db contains userobjects.
These objects have a last_online (when the user was online) and last_checked (the last time I checked the userobject).

Checking the userobject can take from 3 to 30 seconds. When the last_checked time is less than 10 minutes then everything's okay; API call returns 200 and the userobject.

But I want to reprocess the userobject when the data is staler than 10 minutes. Obviously I can not have my API return sit there and wait.

What is the right approach to HTTP APIs that (sometimes) need to return data from long running processes?

What sort of API are you talking about? A web API? A library in some specific language? If so, which language? — Jon Skeet, Mar 26 '14 at 07:08
Possible duplicate of http://stackoverflow.com/questions/9794696/best-http-status-code-in-rest-api-for-not-ready-yet-try-again-later — Raedwald, Mar 26 '14 at 20:01

score 5 · Accepted Answer · edited Oct 07 '21 at 11:04

My first proposal would be to have the server update the user object every X minutes as a background process. I don't see any reason to place the burden of keeping server data up-to-date on the client. Responses to the GET call would include an Expires header. The client could then cache the response for a fixed amount of time, saving you server hits until the data gets refreshed.

If you must make the refresh be client-driven, you want your GET to return a 202 Accepted, which indicates a valid request that the API is working on but has not completed. The entity that gets returned from your GET request should provide a timestamp for when the API should check back to get the updated data. Once the data has been refreshed, the GET will return a 200 Ok with the refreshed data. This is the approach I recommend.

GET /userObject
<- 202 Accepted
{ "checkAt": <timestamp> }

GET /userObject
<- 200 OK
{ "userName": "Bob", ... }

You could also consider using the Retry-After header in your response, but that's only appropriate for 503 Service Unavailable or any of the various 3xx (Redirection) responses. You definitely aren't describing a 503, and it doesn't sound like redirection is correct either.

If you do want to go the redirection route, you'd return a 302 Found, specifying the temporary URI in the Location header and the delay time in the Retry-After header.

A fourth approach would be to use a POST and the Post-Redirect-Get pattern. You could POST to your userObject URI and have it return the 302 Found with the Retry-After header.

I really don't think that options three or four buy you anything that the second option doesn't, and I think it's the most clear. Three implies that your resource currently lives in a different location when it doesn't. Four transforms what is fundamentally a GET request (give me the user object) into a POST (refresh the user object, but only if you need to).

If you do decide to follow @JonSkeet's suggestion, you probably want a separate resource, something like /userObjects and /userObjectRequests. The client would always POST to /userObjectRequests. If the userObject was valid on the back end, that POST would return a 302 to /userObjects. If it wasn't valid, the POST would return an entity with an id and an estimated completion time. The client could call GET on /userObjectRequests/{id}, and they'd either get a 302 to the userObject (if it's ready) or a 200 with the id and a new estimated completion time.

score 4 · Answer 2 · answered Mar 26 '14 at 08:49

4

One fairly "old-school" way of handling this would be to return a continuation token - basically a job ID saying, "Check this periodically; sooner or later it'll come back with a result." Given that even 30 seconds is quite a long time, you might want to give back a continuation token even in the normal "checking" situation.

More modern alternatives would be web sockets or a hanging get... it really depends on what your client use cases are.

answered Mar 26 '14 at 08:49

Jon Skeet

1,421,763
867
9,128
9,194

What would be the HTTP code for returning a token and acknowledging the request? – Gerben Jacobs Mar 26 '14 at 08:52
@GerbenJacobs: Well I'd assume you'd be returning the data as JSON - I'd just use 200, as it *is* a successful response in itself. It's possible that there's a more dedicated response code, but I'm not aware of it. – Jon Skeet Mar 26 '14 at 08:57
I'd return the job-id token in a `202` code ("Accepted") – Ron Klein Apr 29 '15 at 11:06

Correct API approach for long processing time

2 Answers2