180

What proven design patterns exist for batch operations on resources within a REST style web service?

I'm trying to be strike a balance between ideals and reality in terms of performance and stability. We've got an API right now where all operations either retrieve from a list resource (ie: GET /user) or on a single instance (PUT /user/1, DELETE /user/22, etc).

There are some cases where you want to update a single field of a whole set of objects. It seems very wasteful to send the entire representation for each object back and forth to update the one field.

In an RPC style API, you could have a method:

/mail.do?method=markAsRead&messageIds=1,2,3,4... etc. 

What's the REST equivalent here? Or is it ok to compromise now and then. Does it ruin the design to add in a few specific operations where it really improves the performance, etc? The client in all cases right now is a Web Browser (javascript application on the client side).

Mark Renouf
  • 30,697
  • 19
  • 94
  • 123

8 Answers8

82

A simple RESTful pattern for batches is to make use of a collection resource. For example, to delete several messages at once.

DELETE /mail?&id=0&id=1&id=2

It's a little more complicated to batch update partial resources, or resource attributes. That is, update each markedAsRead attribute. Basically, instead of treating the attribute as part of each resource, you treat it as a bucket into which to put resources. One example was already posted. I adjusted it a little.

POST /mail?markAsRead=true
POSTDATA: ids=[0,1,2]

Basically, you are updating the list of mail marked as read.

You can also use this for assigning several items to the same category.

POST /mail?category=junk
POSTDATA: ids=[0,1,2]

It's obviously much more complicated to do iTunes-style batch partial updates (e.g., artist+albumTitle but not trackTitle). The bucket analogy starts to break down.

POST /mail?markAsRead=true&category=junk
POSTDATA: ids=[0,1,2]

In the long run, it's much easier to update a single partial resource, or resource attributes. Just make use of a subresource.

POST /mail/0/markAsRead
POSTDATA: true

Alternatively, you could use parameterized resources. This is less common in REST patterns, but is allowed in the URI and HTTP specs. A semicolon divides horizontally related parameters within a resource.

Update several attributes, several resources:

POST /mail/0;1;2/markAsRead;category
POSTDATA: markAsRead=true,category=junk

Update several resources, just one attribute:

POST /mail/0;1;2/markAsRead
POSTDATA: true

Update several attributes, just one resource:

POST /mail/0/markAsRead;category
POSTDATA: markAsRead=true,category=junk

The RESTful creativity abounds.

Alex
  • 5,909
  • 2
  • 35
  • 25
  • 1
    One could argue your delete should actually be a post since it isn't actually destroying that resource. – Chris Nicola Dec 13 '11 at 00:28
  • idempotent methods. get is to read. post is to create/append. put is to update/replace. delete is to delete/destroy. – Alex May 15 '12 at 01:11
  • I'm confused are you saying POST is idempotent? It's not. Also your mapping of HTTP verbs to CRUD isn't 100% true, it's just common not required. – Chris Nicola May 15 '12 at 02:29
  • what i am saying is that it is actually destroying that resource. "DELETE /mail?&id=0&id=1&id=2" will cause "/mail?&id=0&id=1&id=2" not to exist. – Alex May 15 '12 at 02:36
  • i mention idempotency because issuing a POST to delete resources is not idempotent. you can't repeat it. in fact, if u issue a "POST /mail?&id=0&id=1&id=2", i should think you want to add something you send in the body to that resource, something like 3 resources "". no? – Alex May 15 '12 at 02:42
  • 6
    It isn't necessary. POST is a factory-pattern method, it is less explicit and obvious than PUT/DELETE/GET. The only expectation is that the server will decide what to do as a result of the POST. POST is exactly what it always was, I submit form data and the server does something (hopefully expected) and gives me some indication as to the result. We are not required to create resources with POST, we just often choose to. I can easily create a resource with PUT, I just have to define the resource URL as the sender (not often ideal). – Chris Nicola May 15 '12 at 18:36
  • This design works when the resource attribute to update has the same value. Any ideas when the attribute has a different value for each resource? Thanks. – Nishant Nagwani Apr 24 '13 at 19:59
  • 1
    @nishant, in this case, you probably don't need to reference multiple resources in the URI, but merely pass tuples with the references/values in the body of the request. e.g., POST /mail/markAsRead, BODY: i_0_id=0&i_0_value=true&i_1_id=1&i_1_value=false&i_2_id=2&i_2_value=true – Alex Apr 25 '13 at 14:31
  • 3
    semicolon is reserved for this purpose. – Alex May 23 '13 at 14:34
  • what if they send in a querystring that is too long, so the batch sends in 100000 ids, do you require them to make seperate calls to split up that list? – PositiveGuy Mar 04 '14 at 18:40
  • In a real world example, I have a system that can batch-assign items in a work queue. The user can either choose all items on the page, a subset of items on the page (selecting which items manually), or all the items in the current search filter. For all items on the page and a subset of items on the page, the specific IDs are sent. For all items in the current search filter, only the search filter parameters are sent with another parameter indicating that all items in the search filter should be affected. – Alex Mar 04 '14 at 22:31
  • 2
    Surprised that no one pointed out that updating several attributes on a single resource is nicely covered by `PATCH` - no need for creativity in this case. – LB2 Dec 02 '16 at 16:17
  • This is about updating multiple resources, not just about specific attributes. It does involve a little creativity. – Alex Dec 04 '16 at 20:41
25

Not at all -- I think the REST equivalent is (or at least one solution is) almost exactly that -- a specialized interface designed accommodate an operation required by the client.

I'm reminded of a pattern mentioned in Crane and Pascarello's book Ajax in Action (an excellent book, by the way -- highly recommended) in which they illustrate implementing a CommandQueue sort of object whose job it is to queue up requests into batches and then post them to the server periodically.

The object, if I remember correctly, essentially just held an array of "commands" -- e.g., to extend your example, each one a record containing a "markAsRead" command, a "messageId" and maybe a reference to a callback/handler function -- and then according to some schedule, or on some user action, the command object would be serialized and posted to the server, and the client would handle the consequent post-processing.

I don't happen to have the details handy, but it sounds like a command queue of this sort would be one way to handle your problem; it'd reduce the overall chattiness substantially, and it'd abstract the server-side interface in a way you might find more flexible down the road.


Update: Aha! I've found a snip from that very book online, complete with code samples (although I still suggest picking up the actual book!). Have a look here, beginning with section 5.5.3:

This is easy to code but can result in a lot of very small bits of traffic to the server, which is inefficient and potentially confusing. If we want to control our traffic, we can capture these updates and queue them locally and then send them to the server in batches at our leisure. A simple update queue implemented in JavaScript is shown in listing 5.13. [...]

The queue maintains two arrays. queued is a numerically indexed array, to which new updates are appended. sent is an associative array, containing those updates that have been sent to the server but that are awaiting a reply.

Here are two pertinent functions -- one responsible for adding commands to the queue (addCommand), and one responsible for serializing and then sending them to the server (fireRequest):

CommandQueue.prototype.addCommand = function(command)
{ 
    if (this.isCommand(command))
    {
        this.queue.append(command,true);
    }
}

CommandQueue.prototype.fireRequest = function()
{
    if (this.queued.length == 0)
    { 
        return; 
    }

    var data="data=";

    for (var i = 0; i < this.queued.length; i++)
    { 
        var cmd = this.queued[i]; 
        if (this.isCommand(cmd))
        {
            data += cmd.toRequestString(); 
            this.sent[cmd.id] = cmd;

            // ... and then send the contents of data in a POST request
        }
    }
}

That ought to get you going. Good luck!

Christian Nunciato
  • 10,276
  • 2
  • 35
  • 45
  • Thanks. That's very similar to my ideas on how I would go forward if we kept the batch operations on the client. The issue is the round-trip time for performing an operation on a large number of objects. – Mark Renouf Feb 04 '09 at 15:46
  • Hm, ok -- I thought you wanted to perform the operation on a large number of objects (on the server) by way of a lightweight request. Did I misunderstand? – Christian Nunciato Feb 04 '09 at 16:37
  • Yes, but I don't see how that code sample would perform the operation any more efficiently. It batches up requests but still sends them to the server one at a time. Am I misinterpreting? – Mark Renouf Feb 04 '09 at 19:58
  • Actually it batches them up and then sends them all at once: that for loop in fireRequest() essentially gathers all outstanding commands, serializes them as a string (with .toRequestString(), e.g., "method=markAsRead&messageIds=1,2,3,4"), assigns that string to "data", and POSTs data to the server. – Christian Nunciato Feb 04 '09 at 20:19
20

While I think @Alex is along the right path, conceptually I think it should be the reverse of what is suggested.

The URL is in effect "the resources we are targeting" hence:

    [GET] mail/1

means get the record from mail with id 1 and

    [PATCH] mail/1 data: mail[markAsRead]=true

means patch the mail record with id 1. The querystring is a "filter", filtering the data returned from the URL.

    [GET] mail?markAsRead=true

So here we are requesting all the mail already marked as read. So to [PATCH] to this path would be saying "patch the records already marked as true"... which isn't what we are trying to achieve.

So a batch method, following this thinking should be:

    [PATCH] mail/?id=1,2,3 <the records we are targeting> data: mail[markAsRead]=true

of course I'm not saying this is true REST (which doesnt permit batch record manipulation), rather it follows the logic already existing and in use by REST.

fezfox
  • 967
  • 9
  • 14
  • Interesting answer! For your last example, wouldn't it be more consistent with the `[GET]` format to do `[PATCH] mail?markAsRead=true data: [{"id": 1}, {"id": 2}, {"id": 3}]` (or even just `data: {"ids": [1,2,3]}`)? Another benefit to this alternate approach is that you won't run up against "414 Request URI too long" errors if you're updating hundreds/thousands of resources in the collection. – rinogo Aug 22 '16 at 23:40
  • @rinogo - actually no. This is the point I was making. The querystring is a filter for the records we want to act upon (eg. [GET] mail/1 gets the mail record with an id of 1, whereas [GET] mail?markasRead=true returns mail where markAsRead is already true). It makes no sense to patch to that same URL (ie. "patch the records where markAsRead=true") when in fact we want to patch particular records with ids 1,2,3, REGARDLESS of the current status of the field markAsRead. Hence the method I described. Agree there is a problem with updating many records. I'd build a less tightly coupled endpoint. – fezfox Aug 24 '16 at 00:23
  • Unfortunately, this breaks down as you approach URL string max length, considering resource IDs are typically 20+ character UIDs. Updating for example a flag or status on a large number of records is a common requirement. – Half_Duplex Mar 22 '22 at 19:34
13

Your language, "It seems very wasteful...", to me indicates an attempt at premature optimization. Unless it can be shown that sending the entire representation of objects is a major performance hit (we're talking unacceptable to users as > 150ms) then there's no point in attempting to create a new non-standard API behaviour. Remember, the simpler the API the easier it is to use.

For deletes send the following as the server doesn't need to know anything about the state of the object before the delete occurs.

DELETE /emails
POSTDATA: [{id:1},{id:2}]

The next thought is that if an application is running into performance issues regarding the bulk update of objects then consideration into breaking each object up into multiple objects should be given. That way the JSON payload is a fraction of the size.

As an example when sending a response to update the "read" and "archived" statuses of two separate emails you would have to send the following:

PUT /emails
POSTDATA: [
            {
              id:1,
              to:"someone@bratwurst.com",
              from:"someguy@frommyville.com",
              subject:"Try this recipe!",
              text:"1LB Pork Sausage, 1 Onion, 1T Black Pepper, 1t Salt, 1t Mustard Powder",
              read:true,
              archived:true,
              importance:2,
              labels:["Someone","Mustard"]
            },
            {
              id:2,
              to:"someone@bratwurst.com",
              from:"someguy@frommyville.com",
              subject:"Try this recipe (With Fix)",
              text:"1LB Pork Sausage, 1 Onion, 1T Black Pepper, 1t Salt, 1T Mustard Powder, 1t Garlic Powder",
              read:true,
              archived:false,
              importance:1,
              labels:["Someone","Mustard"]
            }
            ]

I would split out the mutable components of the email (read, archived, importance, labels) into a separate object as the others (to, from, subject, text) would never be updated.

PUT /email-statuses
POSTDATA: [
            {id:15,read:true,archived:true,importance:2,labels:["Someone","Mustard"]},
            {id:27,read:true,archived:false,importance:1,labels:["Someone","Mustard"]}
          ]

Another approach to take is to leverage the use of a PATCH. To explicitly indicate which properties you are intending to update and that all others should be ignored.

PATCH /emails
POSTDATA: [
            {
              id:1,
              read:true,
              archived:true
            },
            {
              id:2,
              read:true,
              archived:false
            }
          ]

People state that PATCH should be implemented by providing an array of changes containing: action (CRUD), path (URL), and value change. This may be considered a standard implementation but if you look at the entirety of a REST API it is a non-intuitive one-off. Also, the above implementation is how GitHub has implemented PATCH.

To sum it up, it is possible to adhere to RESTful principles with batch actions and still have acceptable performance.

justin.hughey
  • 1,246
  • 15
  • 16
  • I agree that PATCH makes the most sense, the issue is that if you have other state transition code that needs to run when those properties change, it becomes more difficult to implement as a simple PATCH. I don't think REST really accommodates any sort of state transitioning, given it's supposed to be stateless, it doesn't care what it's transitioning from and to, only what it's current state is. – BeniRose Oct 03 '17 at 14:38
  • Hey BeniRose, thanks for adding a comment, I often wonder if people see some of these posts. It makes me happy to see that people do. Resources regarding the "stateless" nature of REST define it as a concern with the server not having to maintain state across requests. As such, it isn't clear to me what issue you were describing, can you elaborate with an example? – justin.hughey Oct 04 '17 at 15:23
  • 1
    Performance issues don't just stem from payload size. Consider a presentation layer that contains only a facade of a business object. Performing an updating in this scenario will eventually require fetching the full biz object, or passing the full biz object to begin with. – Half_Duplex Mar 22 '22 at 19:28
10

The google drive API has a really interesting system to solve this problem (see here).

What they do is basically grouping different requests in one Content-Type: multipart/mixed request, with each individual complete request separated by some defined delimiter. Headers and query parameter of the batch request are inherited to the individual requests (i.e. Authorization: Bearer some_token) unless they are overridden in the individual request.


Example: (taken from their docs)

Request:

POST https://www.googleapis.com/batch

Accept-Encoding: gzip
User-Agent: Google-HTTP-Java-Client/1.20.0 (gzip)
Content-Type: multipart/mixed; boundary=END_OF_PART
Content-Length: 963

--END_OF_PART
Content-Length: 337
Content-Type: application/http
content-id: 1
content-transfer-encoding: binary


POST https://www.googleapis.com/drive/v3/files/fileId/permissions?fields=id
Authorization: Bearer authorization_token
Content-Length: 70
Content-Type: application/json; charset=UTF-8


{
  "emailAddress":"example@appsrocks.com",
  "role":"writer",
  "type":"user"
}
--END_OF_PART
Content-Length: 353
Content-Type: application/http
content-id: 2
content-transfer-encoding: binary


POST https://www.googleapis.com/drive/v3/files/fileId/permissions?fields=id&sendNotificationEmail=false
Authorization: Bearer authorization_token
Content-Length: 58
Content-Type: application/json; charset=UTF-8


{
  "domain":"appsrocks.com",
   "role":"reader",
   "type":"domain"
}
--END_OF_PART--

Response:

HTTP/1.1 200 OK
Alt-Svc: quic=":443"; p="1"; ma=604800
Server: GSE
Alternate-Protocol: 443:quic,p=1
X-Frame-Options: SAMEORIGIN
Content-Encoding: gzip
X-XSS-Protection: 1; mode=block
Content-Type: multipart/mixed; boundary=batch_6VIxXCQbJoQ_AATxy_GgFUk
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
Date: Fri, 13 Nov 2015 19:28:59 GMT
Cache-Control: private, max-age=0
Vary: X-Origin
Vary: Origin
Expires: Fri, 13 Nov 2015 19:28:59 GMT

--batch_6VIxXCQbJoQ_AATxy_GgFUk
Content-Type: application/http
Content-ID: response-1


HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Date: Fri, 13 Nov 2015 19:28:59 GMT
Expires: Fri, 13 Nov 2015 19:28:59 GMT
Cache-Control: private, max-age=0
Content-Length: 35


{
 "id": "12218244892818058021i"
}


--batch_6VIxXCQbJoQ_AATxy_GgFUk
Content-Type: application/http
Content-ID: response-2


HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Date: Fri, 13 Nov 2015 19:28:59 GMT
Expires: Fri, 13 Nov 2015 19:28:59 GMT
Cache-Control: private, max-age=0
Content-Length: 35


{
 "id": "04109509152946699072k"
}


--batch_6VIxXCQbJoQ_AATxy_GgFUk--
Aides
  • 3,643
  • 5
  • 23
  • 39
2

From my point of view I think Facebook has the best implementation.

A single HTTP request is made with a batch parameter and one for a token.

In batch a json is sent. which contains a collection of "requests". Each request has a method property (get / post / put / delete / etc ...), and a relative_url property (uri of the endpoint), additionally the post and put methods allow a "body" property where the fields to be updated are sent .

more info at: Facebook batch API

1

Great post. I've been searching for a solution for a few days. I came up with a solution of using passing a query string with a bunch IDs separated by commas, like:

DELETE /my/uri/to/delete?id=1,2,3,4,5

...then passing that to a WHERE IN clause in my SQL. It works great, but wonder what others think of this approach.

Roberto
  • 987
  • 1
  • 9
  • 21
  • 1
    I don't really like it because it kind of introduces a new type, the string that you use as a list in where in. I'd rather parse it to a language specific type instead and then I can use the same method in the same way in multiple different parts of the system. – softarn Sep 01 '14 at 14:26
  • 4
    A reminder to be cautious of SQL injection attacks and always cleanse your data and use bind parameters when taking this approach. – justin.hughey Oct 22 '14 at 20:13
  • 2
    Depends on the desired behavior of `DELETE /books/delete?id=1,2,3` when book #3 doesn't exist -- the `WHERE IN` will silently ignore records, whereas I would usually expect `DELETE /books/delete?id=3` to 404 if 3 doesn't exist. – chbrown Nov 16 '14 at 01:10
  • 3
    A different problem you may run into using this solution is the limit on characters allowed in a URL string. If someone decides to bulk delete 5,000 records the browser may reject the URL or the HTTP Server (Apache for instance) may reject it. General rule (which hopefully is changing with better servers and software) has been to go with a maximum size of 2KB. Where with the body of a POST you can go up to 10MB. http://stackoverflow.com/questions/2364840/what-is-the-size-limit-of-a-post-request – justin.hughey Aug 28 '15 at 12:40
  • @chbrown typically `DELETE` operations in a REST API are intended to be idempotent, so you should not be returning a 404, but either a 204 or a 200. – David Keaveny Feb 09 '23 at 00:53
1

I would be tempted in an operation like the one in your example to write a range parser.

It's not a lot of bother to make a parser that can read "messageIds=1-3,7-9,11,12-15". It would certainly increase efficiency for blanket operations covering all messages and is more scalable.

  • Good observation and a good optimization, but the question was whether this style of request could ever be "compatible" with the REST concept. – Mark Renouf Feb 04 '09 at 15:45
  • Hi, yeah I understand. The optimisation does make the concept it more RESTful and I didn't want to leave out my advice just because it was wandering a small way from topic. –  Feb 04 '09 at 16:41