8

I've been searching for best practices for preventing the accidental creation of duplicate resources when using POST to create a new resource, for the case where the resource is to be named by the server and hence PUT can't be used. The API I'm building will be used by mobile clients, and the situation I'm concerned about is when the client gets disconnected after submitting the POST request but before getting the response. I found this question, but there was no mention of using a conditional POST, hence my question.

Is doing a conditional POST to the parent resource, analogous to using a conditional PUT to modify a resource, a reasonable solution to this problem? If not, why not?

The client/server interaction would be just like with a conditional PUT:

  1. Client GETs the parent resource, including the ETag reflecting its current state (which would include its subordinate resources),

  2. Client does a conditional POST to the parent resource (includes the parent's ETag value in an If-Match header) to create a new resource,

  3. Client gets disconnected before getting the server response, so doesn't know if it succeeded,

  4. Later, when reconnected, the client resubmits the same conditional POST request,

  5. Either the earlier request didn't reach the server, so the server creates the resource and replies with a 201, or the earlier request did reach the server, so the server replies with a 412 and the duplicate resource isn't created.

Community
  • 1
  • 1
Chris Toomey
  • 336
  • 4
  • 10
  • Wouldn't that mean the GET/POST pair will fail if another client does a GET/POST pair in between? That might be an issue in high traffic situations. – geon May 02 '13 at 15:49
  • 1
    @geon You're right, but the collection resources in question were user-specific, not global, so the potential for such conflicts was small. – Chris Toomey Mar 03 '15 at 07:55

3 Answers3

3

Your solution is clever, but less than ideal. Your client may never get his 201 confirmation, and will have to interpret the 412 error as success.

REST afficianados often suggest you create the resource with an empty POST, then, once the client has the id of the newly created resource, he can do an "idempotent" update to fill it. This is nice, but you will likely need to make DB columns nullable that wouldn't otherwise be, and your updates are only idempotent if no-one else is trying to update at the same time.

According to ME, HTTP is flaky. Requests timeout, browser windows get closed, connections get reset, trains go into tunnels with mobile users aboard. There's a simple, robust pattern for dealing with this. Unsafe actions should always be uniquely identified, and servers should store, and be able to repeat if necessary, the response to any unsafe request. This is not HTTP caching, where a request may be served from cache but the cache may be flushed for whatever reason. This is a guarantee by the server application that if an "action" request is seen a second time, the stored response will be repeated without anything else happening. If the action identity is to be generated by the server, then a request-response should be dedicated just to sending the id. If you implement this for one unsafe request, you might as well do it for all of them, and in so doing you will escape numerous thorny problems: successive update requests wiping out other users' changes, or hitting incompatible states ("order already submitted"), successive delete requests generating 404 errors.

I have a little google doc exploring the pattern more fully if you're interested.

bbsimonbb
  • 27,056
  • 15
  • 80
  • 110
  • Interesting suggestion, thanks, but it'd be awfully heavy weight for servers to implement since they'd have to maintain a history of all the actions performed and the responses they generated. Also, you're advocating different semantics for PUTs to action URIs (replace the targeted resource IFF hasn't been replaced before via this action URI) than for "normal" PUTs (always replace the target resource), which is a pretty dramatic departure from the standard. As you say it does have some benefits, but I'd still favor the other simpler solutions that have been mentioned. – Chris Toomey Feb 17 '16 at 06:30
  • "Awfully" is a bit strong ! Responses will be tiny, a fraction of a kilobyte. If you had huge volumes, you could use an ACID key value store (couchDB?) just for storing respones. The payments web-service where I first used this pattern has been ticking away happily for 15 years atop a SQL Server DB. It's so simple to develop, to integrate with and to support that I find myself agog at the other answers to this problem. You can't not have noticed: Among all the RESTful discussion of how you *should* deal with this, no-one talks about their experience, their problems, their volumes. – bbsimonbb Feb 17 '16 at 09:05
  • Maybe, but storing the history of all modifications to resources could require significantly more storage than the resources themselves. And conceptually, storing the modification history is a significant change. For preventing accidental duplicate resource creation, PUT with client-generate URI is simplest and lightest, and then as you mention, the empty POST followed by PUT would require only additionally storing the created ids until they're populated. – Chris Toomey Feb 17 '16 at 20:10
  • And the other scenario you address in your doc, accidental modifications to existing resources, is already solved via standard ETag and [If-Match](http://tools.ietf.org/html/rfc2616#section-14.24) headers. So in my opinion these are lighter weight solutions (less/no additional storage required) and also have the benefit of adhering to the standard semantics of the HTTP verbs. – Chris Toomey Feb 17 '16 at 20:11
  • How does my proposal depart from standard semantics of HTTP verbs? Are we clear that we're just storing the confirmation response, not the modification? – bbsimonbb Feb 17 '16 at 23:08
  • Your doc suggested PUTs to action URIs act conditionally based on whether previously applied or not so as to not accidentally modify subsequent changes. Also, you're not replacing the referenced URI (the action), but instead the resource the action refers to. – Chris Toomey Feb 18 '16 at 02:45
  • Since changing the semantics of HTTP verbs is a hanging offence, I argue in my defence that I'm doing no such thing. My reliable actions *are* resources: persistent, created and fetched following REST principles. But their behaviour: not modifiable after creation, and their resolution: modifying once only their parent resource, belong to the semantics of my application, and should be seen as independent of the network protocol. Of course it would be nice if all web services worked the same, and here I am proposing a disruptive change. In a rational world, one would weigh up costs and benefits. – bbsimonbb Feb 18 '16 at 08:43
0

I think this scheme would work. If you want to ensure POST does not result in duplicates, you need the client to send something unique in the POST. The server can then verify uniqueness.

You might as well have the client generate a GUID for each request, rather than obtaining this from the server via a GET.

Your steps then become:-

  1. Client generates a GUID
  2. Client does a POST to the resource, which includes the GUID
  3. Client gets disconnected and doesn't know if it succeeded
  4. Client connects again and does another POST with the same GUID
  5. Server checks the GUID, and either creates the resource (if it never received the first POST) or indicates that this was a duplicate

It might be more restful to use PUT, and have the client decide the resource name. If you did not like the choosen name, you could indicate that you had created the resource but that it's canonical location was somewhere of the server's choosing.

WW.
  • 23,793
  • 13
  • 94
  • 121
  • The downside w/ adding a GUID field is that you end up having to keep around an extra field just for the purposes of duplicate detection which is undesirable. In the end we did indeed go with client-generated GUIDs so we could use PUT. – Chris Toomey Mar 03 '15 at 08:00
  • Did you mean client generated URI? – WW. Mar 03 '15 at 08:01
0

Why not simply do duplicate detection on the server based on the actual resource, using whatever internal mechanism the server chooses to use.

It's just safer that way.

Then you return the URL to the appropriate resource (whether it was freshly created or not).

If the parents ETag is based on the state of sub resources, then it's not a reliable mechanism to check for "duplicate resources". All you know is that the parent has "changed", somehow, since last time. How do you even know it's because your old POST was processed after disconnect? Could be anything changed that ETag.

This is basically a optimistic locking scenario being played out, and it comes down to another question. If the resource is already created, what then? Is that an error? Or a feature? Do you care? Is it bad to send a creation request that's silently ignored by the server when the resource already exists?

And if it already exists, but is "different" enough (i.e. say the name matches but the address is different), is that a duplicate? is that an update? is that a error for trying to change an existing resource?

Another solution is to make two trips. One to stage the request, another to commit it. You can query the status of the request when you come back if it's interrupted. If the commit didn't got through, you can commit it again. If it did, you're happy and can move on.

Just depends on how unstable your comms are and how important this particular operation is whether you want to jump through the hoops to do it safely.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
  • In our case the resource collection was a list not a set, hence there could legitimately be multiple copies of the same resource (though with distinct ids). The problem we wanted to prevent was the client causing multiple copies to be created accidentally on behalf of the user requesting a single copy. You're right about the parent's ETag being unreliable in general, but in the case of user-specific parent resource it was more viable. In the end we went with the simple and robust solution of client-generated GUIDs and PUT. – Chris Toomey Mar 03 '15 at 08:08
  • A lovely example of the standard confusion people get into trying to maintain REST purity. – bbsimonbb Feb 16 '16 at 10:32