16

I have a form with a textarea that can contain large amounts of content (say, articles for a blog) edited using one of a number of third party rich text editors. I'm trying to implement something like an autosave feature, which should submit the content through ajax if it's changed. However, I have to work around the fact that some of the editors I have as options don't support an "isdirty" flag, or an "onchange" event which I can use to see if the content has changed since the last save.

So, as a workaround, what I'd like to do is keep a copy of the content in a variable (let's call it lastSaveContent), as of the last save, and compare it with the current text when the "autosave" function fires (on a timer) to see if it's different. However, I'm worried about how much memory that could take up with very large documents.

Would it be more efficient to store some sort of hash in the lastSaveContent variable, instead of the entire string, and then compare the hash values? If so, can you recommend a good javascript library/jquery plugin that implements an appropriate hash for this requirement?

user4815162342
  • 2,058
  • 3
  • 18
  • 23
  • It will probably almost never happen in your use case, but for the casual reader that happens to land here in search for javascript and hashing (like me), it could be worth noting that comparing two hash values is *not* the same as comparing two strings, hash may (and will) collide, that is .. the same hash for two different strings. So in many use cases you should perform a full comparison if you get the same hash value anyway. – Simone Gianni Apr 26 '12 at 18:23
  • Good point. In addition, in case those readers are wondering, the reason that Hashtable objects, such as those found in many collections APIs, still work despite this is because they contain functionality to handle these collisions when two keys produce the same hash. – user4815162342 May 18 '12 at 02:17

3 Answers3

29

In short, you're better off just storing and comparing the two strings.


Computing a proper hash is not cheap. For example, check out the pseudo code or an actual JavaScript implementation for computing the MD5 hash of a string. Furthermore, all proper hash implementations will require enumerating the characters of the string anyway.

Furthermore, in the context of modern computing, a string has to be really, really long before comparing it against another string is slow. What you're doing here is effectively a micro-optimization. Memory won't be an issue, nor will the CPU cycles to compare the two strings.

As with all cases of optimizing: check that this is actually a problem before you solve it. In a quick test I did, computing and comparing 2 MD5 sums took 382ms. Comparing the two strings directly took 0ms. This was using a string that was 10000 words long. See http://jsfiddle.net/DjM8S.

If you really see this as an issue, I would also strongly consider using a poor-mans comparison; and just comparing the length of the 2 strings, to see if they have changed or not, rather than actual string comparisons.

..

Matt
  • 74,352
  • 26
  • 153
  • 180
  • Okay, let's say the user wanted to post a chapter of a novel instead. About how long would the articles have to be to considered the length of a "huge extract of the Bible"? – user4815162342 Mar 23 '10 at 22:26
  • 3
    MD5'ing the string, then comparing that to the "previous" md5 sum, takes 382ms. Basic string comparison takes 0ms; this is using a string that is ~10000 words long. (http://www.jsfiddle.net/DjM8S/) – Matt Mar 24 '10 at 01:33
  • Thank you. This is the better answer of the two, and the sort of answer I was looking for (although the other answer is informative as well). I'd vote for it, but apparently as a new user I don't have enough reputation. – user4815162342 Mar 24 '10 at 22:45
4

An MD5 hash is often used to verify the integrity of a file or document; it should work for your purposes. Here's a good article on generating an MD5 hash in Javascript.

Jacob Mattison
  • 50,258
  • 9
  • 107
  • 126
  • Useful information, but if I don't need to bother with this, as the other answer suggested, then it's a little less code that I have to maintain. – user4815162342 Mar 25 '10 at 04:49
1

I made a JSperf rev that might be useful here for performance measuring. Please add different revisions and different types of checks to the ones I made!

http://jsperf.com/long-string-comparison/2

I found two major results

  • When strings are identical performance is murdered; from ~9000000 ops/s to ~250 ops/sec (chrome)
  • The 64bit version of IE9 is much slower on my PC, results from the same tests:

    +------------+------------+
    | IE9 64bit  |  IE9 32bit |
    +------------+------------+
    | 4,270,414  | 8,667,472  |
    | 2,270,234  | 8,682,461  |
    +------------+------------+
    

Sadly, jsperf logged both results as simply "IE 9".

Even a precursory look at JS MD5 performance tells me that it is very, very slow (at least for large strings, see http://jsperf.com/md5-shootout/18 - peaks at 70 ops/sec). I would want to go as far as to try AJAXing the hash calculation or the comparison to the backend but I don't have time to test, sorry!

Joel Peltonen
  • 13,025
  • 6
  • 64
  • 100