40

Is there a standard library or tool out there for computing and applying differences to JSON documents? Basically I have a bunch of largish documents that I want to keep synchronized across a network, and I would prefer to avoid having to resend their entire state each time that I want to synchronize them (since many of these variables aren't going to change). In other words, I only want to transmit the fields which changed, not retransmit the entire object. I would think that it would be convenient to have something like the following set of methods:

//Start with two distinct objects on the server
// prev represents a copy of the state of the object on the client
// next represents a copy of the state of the object on the server
//
//1. Compute a patch
patch = computePatch(prev, next);

//2. Send patch over the network

//3. Apply the patch on the client
applyPatch(prev, patch);

//Final invariant:
//   prev represents an equivalent object to JSON.parse(JSON.stringify(next))

I could certainly implement one myself, but there are quite a few edge cases that need to be considered. Here are some of the straightforward (though somewhat unsatisfactory) methods that I can think of such as:

  1. Roll my own JSON patcher. Asymptotically, this is probably the best way to go, since it would be possible to support all the relevant features of JSON documents, along with supporting some specialized methods for doing stuff like diffing ints, doubles and strings (using relative encoding/edit distance). However, JSON has a lot of special cases and I am a bit leery of trying to do this without a lot of testing, and so I would much prefer to find something that already solves this problem for me so that I can trust it, and not have to worry about network Heisenbugs showing up due to mistakes in my JSON patching

  2. Just compute the edit distance directly between the JSON strings using dynamic programming. Unfortunately, this doesn't work if the client and server have different JSON implementations (ie the order of their fields could be serialized differently), and it is also pretty expensive being a quadratic time operation.

  3. Use protocol buffers. Protocol buffers have a built in diff method which does exactly what I want, and they are a nice binary-serializable network friendly format. Unfortunately, because they are also strictly typed, they lack many of the advantages of using JSON such as the ability to dynamically add and remove fields. Right now this is the approach I am currently leaning towards, but it could make future maintenance really horrible as I would need to continually update each of my objects.

  4. Do something really nasty, like make a custom protocol for each type of object, and hope that I get it right in both places (yeah right!).

Of course what I am really hoping for is for someone here on stackoverflow to come through and save the day with a reference to a space efficient javascript object differ/patcher that has been well tested in production environments and across multiple browsers.

*Update*

I started writing my own patcher, an early version of it is available at github here:

https://github.com/mikolalysenko/patcher.js

I guess since there doesn't seem to be much out here, I will instead accept as an alternative answer a list of interesting test cases for a JSON patcher.

Mikola
  • 9,176
  • 2
  • 34
  • 41
  • Since you mentioned the "quite a few edge cases that need to be considered", it might be helpful (to your answer and for posterity) if you enumerated what edge cases need to be handled, and how they should be resolved. – Phrogz Sep 06 '11 at 21:48
  • Aside: the second half of your question is nice to show that you've thought about the problem, and might be nice as part of 'workarounds' answer, but is rather irrelevant to the question, right? – Phrogz Sep 06 '11 at 21:49
  • there's always adding a `dirty` flag. – David Wick Sep 06 '11 at 21:50
  • 1
    There's a `dirty` flag? I don't see it. – Peter Olson Sep 06 '11 at 21:53
  • The definition of "largish" matters, and how they change matters. Unless you have huge docs that have very large numbers of tiny changes all the time, this might be premature optimization. – ccleve Sep 06 '11 at 21:54
  • @user237815: Resending the documents over the network for each update is out of the question at this time. The cost in additional bandwidth and latency would be far too great. – Mikola Sep 06 '11 at 21:58
  • @David Wick: I thought about that too, and I guess I should add that I did try that at first but it got to be quite annoying to keep track of. I came up with this idea as a work around, but maybe there are better solutions. – Mikola Sep 06 '11 at 21:59
  • This SO question has a similar discussion http://stackoverflow.com/questions/584338/how-to-push-diffs-of-data-possibly-json-to-a-server – Narendra Yadala Sep 08 '11 at 06:05
  • Google drive API does exactly that in JS but prob not what your looking for – PauAI Dec 27 '15 at 16:54

4 Answers4

14

I've been mantaining a json diff & patch library at github (yes, shameless plug):

https://github.com/benjamine/JsonDiffPatch

it handles long strings automatically using Neil Fraser's diff_match_patch lib. it works both on browsers and server (unit tests running on both env). (full feature list is on project page)

The only thing you probably would need, that's not implemented is the option to inject custom diff/patch functions for specific objects, but that doesn't sound hard to add, you're welcome to fork it, and even better send a pull request.

Regards,

Benja
  • 4,099
  • 1
  • 30
  • 31
6

The JSON-patch standard has been updated.

https://datatracker.ietf.org/doc/html/draft-ietf-appsawg-json-patch-10

You can find an implementation for applying patches and generating patches at https://github.com/Starcounter-Jack/Fast-JSON-Patch

Community
  • 1
  • 1
Jack Wester
  • 5,170
  • 3
  • 28
  • 47
5

I came across this question searching for implementations of json-patch. If you are rolling your own you might want to base it on this draft.

https://datatracker.ietf.org/doc/html/draft-pbryan-json-patch-00

Community
  • 1
  • 1
iain
  • 10,798
  • 3
  • 37
  • 41
  • 1
    There are already some libraries for JSON Patch, like https://github.com/dharmafly/jsonpatch.js – warpech Sep 05 '12 at 09:33
3

Use JSON Patch which is the standard way to do this.

JSON Patch is a format for describing changes to a JSON document. It can be used to avoid sending a whole document when only a part has changed. When used in combination with the HTTP PATCH method it allows partial updates for HTTP APIs in a standards compliant way.

The patch documents are themselves JSON documents.

JSON Patch is specified in RFC 6902 from the IETF.

Libraries exist for most platforms and programming languages.

At the time of writing, Javascript, Python, PHP, Ruby, Perl, C, Java, C#, Go, Haskell and Erlang are supported (full list and libraries here).

Here is a list for javascript

  • Fast-JSON-Patch both diffs and patches, 509,361 weekly downloads on NPM
  • jiff both diffs and patches, 5,075 weekly downloads on npm
  • jsonpatch-js only applies patches, 2,014 weekly downloads in npm
  • jsonpatch.js only applies patches, 1,470 weekly downloads in npm
  • JSON8 Patch both diffs and patches, 400 weekly downloads on npm

By far, everybody (myself included), is using the Fast-JSON-Patch library. It works in NodeJS and in the browser.

Community
  • 1
  • 1
Eloims
  • 5,106
  • 4
  • 25
  • 41
  • [JSON-Patch](http://jsonpatch.com/) is the right format for step 1 (Compute a patch). If you need to minimize JSON-Patch size on step 2 (Send patch over the network) use [patchpack](https://github.com/udamir/patchpack) library. – Damir Nov 30 '20 at 21:13
  • @Damir As there are many other compact and widely used formats (FlatBuffers, protobuf, avro, bson, msgpack...), when promoting your own library in StackOverflow, please add an "I wrote this" disclaimer. – Eloims Dec 01 '20 at 14:19