0

For a string and an array, I have an "old" state, and a "new" state (after some modifications).

I need to often send to the server (with AJAX/XMLHttpRequest) the changes, if possible in an efficient way (don't resend 200 KB of data if only one element in the array has changed/has been deleted/has been moved). Example:

var oldstate1 = 'hello how are you? very good and you? thanks for asking! this text will be removed.';
var newstate1 = 'hello how are you? very good and you? new text here. thanks for asking!';

var oldstate2 = [[1732, "item1"], [1732, "will be deleted"], [23, "will be moved"], [23, "hello"]];
var newstate2 = [[23, "will be moved"], [1732, "item1"], [23, "hello"], [126, "new item"]];

Of course, I could manually code a protocol between client and server with events such as delete, insert, move, update, etc. and send these events with AJAX, and the server would update its database accordingly. But this is quite tedious to do correctly.

Question: is there a more clever way to encode only changes between oldstate and newstate of a big string or array with Javascript? in a way that can easily be decoded on backend, running Python.

Something similar to a diff/patch algorithm for strings or arrays, understood between JS (client-side) and Python (back-end).

Note:

Basj
  • 41,386
  • 99
  • 383
  • 673
  • A simple but inelegant method would be to send Javascript text of the operations you perform on the array, then run the JS in a new script tag – CertainPerformance Feb 18 '20 at 08:17
  • @CertainPerformance Interesting indeed! Unfortunately the backend doesn't run Node.js, so I couldn't run the same JS operations there. – Basj Feb 18 '20 at 08:18
  • You wouldn't have to *run* them, but you would have to be able to dynamically *create* the JS strings from the operations you perform in Python. Depends on the sort of manipulations you need to do, might be easy, might be near impossible – CertainPerformance Feb 18 '20 at 08:22
  • https://stackoverflow.com/questions/38865869/how-to-find-difference-between-two-array-using-lodash-underscore-in-nodejs – Zydnar Feb 18 '20 at 08:29
  • @Zydnar Yes this is similar to [How to get the difference between two arrays in JavaScript?](https://stackoverflow.com/questions/1187518/how-to-get-the-difference-between-two-arrays-in-javascript), but at the end you have to manually recode a protocol to make use of these array differences of `arr1` and `arr2`, and differences of `arr2` and `arr1`, so that the server knows where to insert these differences / what to delete. The question here was: is there an ready-to-use tool to do that? – Basj Feb 18 '20 at 08:32
  • So what you wan't is merge two arrays, rewrite (update) already existing and delete what have been deleted? If so maybe try https://lodash.com/docs/#update – Zydnar Feb 18 '20 at 08:39
  • Maybe @Zydnar; could you post an example with `oldstate2` and `newstate2` that I included in the question? It would be interesting to see how it works with lodash. – Basj Feb 18 '20 at 08:42

1 Answers1

0

It is probably possible with no third-party tool (if so, feel free to post another answer).

After further research, I found this project: diff-match-patch and particularly this example.

Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.

var oldstate1 = "hello how are you? very good and you? blablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablabla. thanks for asking! this text will be removed.";
var newstate1 = "hello how are you? very good and you? blablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablabla.new text here. thanks for asking!";
var dmp = new diff_match_patch();
var diff = dmp.diff_main(oldstate1, newstate1);
dmp.diff_cleanupSemantic(diff);
console.log(dmp.diff_toDelta(diff))    
// =138 +new text here. =19 -27
<script src="https://cdnjs.cloudflare.com/ajax/libs/diff_match_patch/20121119/diff_match_patch.js"></script>

This delta + oldstate1 can then be used server-side, with Python, to reconstruct the newstate1:

import diff_match_patch as dmp_module
dmp = dmp_module.diff_match_patch()

oldstate1 = "hello how are you? very good and you? blablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablabla. thanks for asking! this text will be removed."
delta = "=138\t+new text here.\t=19\t-27"
patch = dmp.patch_make(oldstate1, dmp.diff_fromDelta(oldstate1, delta))
newstate1 = dmp.patch_apply(patch, oldstate1)[0]
print(newstate1)

See also this issue.

Basj
  • 41,386
  • 99
  • 383
  • 673
  • I was going to suggest looking for something like this when I saw the question. I was sure it *had* to exist. But note that you probably need something akin to commit/version numbers to apply this properly. If multiple clients edit the same data, or if one does multiple times, there is a real possibility of getting out of sync. – Scott Sauyet Feb 18 '20 at 14:57
  • 1
    That's right @ScottSauyet, I just started to implement it using a commit number. If one user pushes with a too old commit number, I display a message "You are using an old version of this document, please refresh the page". It's works for now. PS: it's interesting to see that Google had to develop such a library for Google Docs needs back in 2006! – Basj Feb 18 '20 at 15:04