Bulk Collection Manipulation through a REST (RESTful) API

Question

I'd like some advice on designing a REST API which will allow clients to add/remove large numbers of objects to a collection efficiently.

Via the API, clients need to be able to add items to the collection and remove items from it, as well as manipulating existing items. In many cases the client will want to make bulk updates to the collection, e.g. adding 1000 items and deleting 500 different items. It feels like the client should be able to do this in a single transaction with the server, rather than requiring 1000 separate POST requests and 500 DELETEs.

Does anyone have any info on the best practices or conventions for achieving this?

My current thinking is that one should be able to PUT an object representing the change to the collection URI, but this seems at odds with the HTTP 1.1 RFC, which seems to suggest that the data sent in a PUT request should be interpreted independently from the data already present at the URI. This implies that the client would have to send a complete description of the new state of the collection in one go, which may well be very much larger than the change, or even be more than the client would know when they make the request.

Obviously, I'd be happy to deviate from the RFC if necessary but would prefer to do this in a conventional way if such a convention exists.

Do you control the clients using your API or do you need to support existing client products? In other words: Are you free to define the semantics of request entities? — mkoeller, Nov 21 '08 at 14:41

score 64 · Answer 1 · answered Aug 12 '09 at 23:10

You might want to think of the change task as a resource in itself. So you're really PUT-ing a single object, which is a Bulk Data Update object. Maybe it's got a name, owner, and big blob of CSV, XML, etc. that needs to be parsed and executed. In the case of CSV you might want to also identify what type of objects are represented in the CSV data.

List jobs, add a job, view the status of a job, update a job (probably in order to start/stop it), delete a job (stopping it if it's running) etc. Those operations map easily onto a REST API design.

Once you have this in place, you can easily add different data types that your bulk data updater can handle, maybe even mixed together in the same task. There's no need to have this same API duplicated all over your app for each type of thing you want to import, in other words.

This also lends itself very easily to a background-task implementation. In that case you probably want to add fields to the individual task objects that allow the API client to specify how they want to be notified (a URL they want you to GET when it's done, or send them an e-mail, etc.).

Good answer. If you want to apply an operation to a large set of resources and you are selecting them using filtering conditions, you might pass to the change task an object with the filter conditions. — Ameba Spugnosa, Feb 27 '13 at 15:31
Although reasonable to avoid "chatty" DELETEs, it is falling back into RPC mentality... Perhaps it is feasible to GET the collection, remove all the things that you want deleted, then PUT the collection (using 409 response status to detect race conditions with concurrent modifiers). The feasibility of this reduces if the GET is paged, in which case, PATCH is probably the right alternative. — delitescere, Aug 07 '13 at 01:56

score 9 · Answer 2 · answered Aug 13 '09 at 07:52

9

Yes, PUT creates/overwrites, but does not partially update.

If you need partial update semantics, use PATCH. See http://greenbytes.de/tech/webdav/draft-dusseault-http-patch-14.html.

answered Aug 13 '09 at 07:52

Julian Reschke

40,156
8
95
98

This makes the most sense to me. However, after searching the web for a few hours, this doesn't seem to be a commonly held viewpoint. Most developers seem to think PATCH has no meaning on a collection. – Mark Rucker Jan 18 '18 at 17:38

score 2 · Answer 3 · answered Nov 20 '08 at 18:21

2

You should use AtomPub. It is specifically designed for managing collections via HTTP. There might even be an implementation for your language of choice.

answered Nov 20 '08 at 18:21

dowski

3,188
2
20
16

3

Actually, AtomPub only defines adding single items to a collection, and removing them. So I wouldn't call that "specifically designed for managing collections". – Julian Reschke Aug 13 '09 at 07:53

score 2 · Answer 4 · answered Nov 20 '08 at 20:09

2

For the POSTs, at least, it seems like you should be able to POST to a list URL and have the body of the request contain a list of new resources instead of a single new resource.

answered Nov 20 '08 at 20:09

Hank Gay

70,339
36
160
222

score 1 · Answer 5 · answered Dec 04 '08 at 10:23

As far as I understand it, REST means REpresentational State Transfer, so you should transfer the state from client to server.

If that means too much data going back and forth, perhaps you need to change your representation. A collectionChange structure would work, with a series of deletions (by id) and additions (with embedded full xml Representations), POSTed to a handling interface URL. The interface implementation can choose its own method for deletions and additions server-side.

The purest version would probably be to define the items by URL, and the collection contain a series of URLs. The new collection can be PUT after changes by the client, followed by a series of PUTs of the items being added, and perhaps a series of deletions if you want to actually remove the items from the server rather than just remove them from that list.

score 0 · Answer 6 · answered Dec 04 '08 at 10:32

You could introduce meta-representation of existing collection elements that don't need their entire state transfered, so in some abstract code your update could look like this:

{existing elements 1-100}
{new element foo with values "bar", "baz"}
{existing element 105}
{new element foobar with values "bar", "foo"}
{existing elements 110-200}

Adding (and modifying) elements is done by defining their values, deleting elements is done by not mentioning it the new collection and reordering elements is done by specifying the new order (if order is stored at all).

This way you can easily represent the entire new collection without having to re-transmit the entire content. Using a If-Unmodified-Since header makes sure that your idea of the content indeed matches the servers idea (so that you don't accidentally remove elements that you simply didn't know about when the request was submitted).

If I understand your proposed update format correctly, this has the disadvantage of not being idempotent, since you refer to collection elements in a relative way (by using indices) -- if the update is `PUT` to the server twice (perhaps by accident), you might end up with a corrupted collection; e.g. element no. 105 will be a different element the second time around than the first time; different elements 101-104 will get deleted the 2nd time around. — stakx - no longer contributing, Dec 27 '10 at 20:32

score 0 · Answer 7 · answered Jan 30 '20 at 05:02

Best way is :

Pass Only Id Array of Deletable Objects from Front End Application To Web API
    2. Then You have Two Options: 
       2.1 Web API Way : Find All Collections/Entities using Id arrays and Delete in API , but you need to take care of Dependant entities like Foreign Key Relational Table Data too
     2.2. Database Way : Pass Ids to your database side, find all records in Foreign Key Tables and Primary Key Tables and Delete in same order i.e. F-Key Table records then P-Key Table records

Bulk Collection Manipulation through a REST (RESTful) API

7 Answers7

Linked