Best practice for implementing long-running searches with REST

Question

As part of a REST Service, I need to implement a search call. The basic idea is that the user can POST a new search, and GET the results:

POST http://localhost/api/search
GET http://localhost/api/search?id=123

However, my search may run a few minutes, and return partial results until it is done. That is, the GET-Request would return something like:

status: running
results: a, b, c.

while the next GET-Request might return

status: completed
results: a, b, c, d, e.

This contradicts the semantics of a RESTful GET request. The request should always return the same result when called several times. For example, when the user uses a caching proxy, the full results might never be delivered to the user.

Question: Is there any way to provide a truly RESTful implementation for long running searches with partial results?

In case you need to know: I use `Jersey`[1] to implement the REST interface, but I think the question is independent of the programming language and framework. [1]: http://jersey.java.net/ — fstab, Oct 11 '11 at 18:00
"The request should always return the same result when called several times." → This is a misinterpretation of REST. Resources can, of course, change periodically and are not expected to be immutable. If resources can change frequently, that should be communicated using caching headers. You're thinking of idempotence, I think, which has to do with how clients manipulate the state of a resource. — Mark E. Haase, Sep 02 '15 at 18:15

Rob Hruska · Accepted Answer · 2011-10-11T19:16:48.420

While the search is executing, you could set the appropriate response headers (e.g. Expires or max-age) to indicate that the response should not be cached (HTTP/1.1 14.9.3, 13.4).

Once the search result is complete, you could then send a more appropriate Expires / max-age header to allow or extend the cacheability of the result.

The burden would be on the client to re-query the resource until its search status is complete. The client could maybe use the value of the Expires header to determine when it should re-query for updated results.

Alongside of this, you could also use a custom 2XX status code to indicate that the result is not yet complete. Maybe a HTTP/1.1 299 In Progress, or whatever makes sense. The spec indicates that HTTP status codes are extensible.

For the record, your statement:

This contradicts the semantics of a RESTful GET request. The request should always return the same result when called several times.

is not true for GET requests - resources can change. That GET requests are idempotent only means that "...the side-effects of N > 0 identical requests is the same as for a single request". ^[spec]

Dammit - you wrote exactly what I would write :-) That is a good solution to the question. And, yes - of course representations can change. E.g. GET /currentTemperature, GET http://stackoverflow.com/questions , etc... +1 from me — Jan Algermissen, Oct 11 '11 at 18:30

score 10 · Answer 2 · edited Jun 20 '20 at 09:12

10

A few days ago I happend to stumble upon a blog post over at reddit that deals with your problem. You might want to check it out: Bill Higgin's RESTy long-ops.

Happy reading.

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 11 '11 at 17:53

aefxx

24,835
6
45
55

4

Can you improve your answer by quoting some of the relevant material? http://meta.stackexchange.com/questions/8231/are-answers-that-just-contain-links-elsewhere-really-good-answers – Mark E. Haase Sep 02 '15 at 18:17

score 3 · Answer 3 · answered Feb 11 '14 at 17:25

It's not a problem if the first GET request returns partial results, and the second GET request returns the full results. That's because the first GET request doesn't cause the result of the second request to change: That request would have returned the full results, even if the first GET hadn't been issued. "idempotent" doesn't mean identical results. It means that the first GET doesn't affect the second GET.

It would be a problem if the first GET request returned partial results, and the second GET would return the remaining results (first GET returns A, B, C; second GET returns D, E, F). Here the first GET changes the second result, so it's not RESTful.

score 0 · Answer 4 · answered Oct 11 '11 at 17:52

Maybe not the most elegant answer, but will get around caching proxies: Just don't send the same query twice. Add a timestamp to the query (&time=1318355458). This way, each request is unique (you could also add milliseconds to the time if you're requesting > 1hz).

As for following the doctrine of "The request should always return the same result when called several times", it seems logically contradictory to the goal of returning partial results at different times for the same query.

score 0 · Answer 5 · answered Oct 11 '11 at 22:29

Could you do a wait instead of a poll if you just want the full results?

Why can't you provide a resource as part of your POST that will get the results PUT to it? You're providing a 'call back' REST interface so, rather than polling, the client process waits for a PUT to the provided resource. You can then either GET the results or the results could be included in the PUT.

Best practice for implementing long-running searches with REST

5 Answers5

Linked