1

I working on a service which scrapes specific links from blogs. The service makes calls to different sites which pulls in and stores the data.

I'm having troubles specifying the url for updating the data on the server where I now use the verb update to pull in the latest links.

I currently use the following endpoints:

GET /user/{ID}/links - gets all previously scraped links (few milliseconds)
GET /user/{ID}/links/update - starts scraping and returned the scraped data (few seconds) 

What would be a good option for the second url? some examples I came up with myself.

GET /user/{ID}/links?collection=(all|cached|latest)
GET /user/{ID}/links?update=1
GET /user/{ID}/links/latest
GET /user/{ID}/links/new
Boedy
  • 6,647
  • 1
  • 21
  • 24
  • What's wrong with an UPDATE request to `/user//links`? – Tim May 02 '14 at 08:23
  • Just a tip: to make it more restful, your url should say `users` instead of `user` – Tim May 02 '14 at 08:23
  • Be careful not to describe what you want with the word "update", or you will have answers that doesn't fit your problem. You *really* want to GET the user links. It's just that you want to choose between up-to-date and cached links. This is very similar to the [`stale` option used in CouchDB](http://docs.couchdb.org/en/latest/maintenance/performance.html?highlight=stale#views-generation) (a restful datastore). So the answer could be for example `GET /user/{ID}/links` and `GET /user/{ID}/links?cached=true`. – Aurélien Bénel May 02 '14 at 09:53
  • @TimCastelijns To pick a nit, REST doesn't say anything about what URLs should look like. Using a plural is a convention, and a good one, but it's unrelated to RESTfulness. – Eric Stein May 02 '14 at 12:45
  • @EricStein that's very true. I should have picked a better word ;-) – Tim May 02 '14 at 12:48

1 Answers1

4

Using GET to start a process isn't very RESTful. You aren't really GETting information, you're asking the server to process information. You probably want to POST against /user/{ID]/links (a quick Google for PUT vs POST will give you endless reading if you're curious about the finer points there). You'd then have two options:

POST with background process: If using a background process (or queue) you can return a 202 Accepted, indicating that the service has accepted the request and is about to do something. 202 generally indicates that the client shouldn't wait around, which makes sense when performing time dependent actions like scraping. The client can then issue GET requests on the first link to retrieve updates.

Creative use of Last-Modified headers can tell the client when new updates are available. If you want to be super fancy, you can implement HEAD /user/{ID}/links that will return a Last-Modified header without a response body (saving both bandwidth and processing).

POST with direct processing: If you're doing the processing during the request (not a great plan in the grand scheme of things), you can return a 200 OK with a response body containing the updated links.

Subsequent GETs would perform as normal.

More info here

And here

And here

Community
  • 1
  • 1
jtv4k
  • 224
  • 2
  • 7
  • Thanks for your explanation. Using a background process sounds good. What if I would like to add links manually? – Boedy May 02 '14 at 14:48
  • POST: /user/{id}/links?action=update: would this do? – Boedy May 03 '14 at 12:10
  • POST /user/{id}/links?action=update isn't entirely restful either. I would suggest creating an endpoint specific to the update action, e.g. /user/{id}/links/update. That way you always know what you're going to get (versus POST /user/{id}/links?action=blarg, which is undefined). – jtv4k May 06 '14 at 17:58