15

Summary: Is there a faster way to hash objects than JSON.stringify?

Details: I have a Ruby and JavaScript library (NeatJSON) that provides pretty-printing of JavaScript values. I recently fixed a problem where deeply-nested objects caused O(n!) performance (n being the nesting level) using memoization based on the object being serialized and the indentation amount.

In Ruby, the fix was really easy, because you can index hashes by arrays of unique sets of objects:

build = ->(object,indent) do
  memoizer[[object,indent]] ||= <all the rest of the code>
end

In JavaScript, however, I can't index an object by another object (in a unique way). Following the lead of several articles I found online, I decide to fix the problem generically, using JSON.stringify on the full set of arguments to the function to create a unique key for memoization:

function memoize(f){
  var memo  = {};
  var slice = Array.prototype.slice;
  return function(){
    var args = slice.call(arguments);
    var mkey = JSON.stringify(args);
    if (!(mkey in memo)) memo[mkey] = f.apply(this,args);
    return memo[mkey];
  }
}

function rawBuild(o,indent){ .. }
var build = memoize(rawBuild);

This works, but (a) it's a little slower than I'd like, and (b) it seems wildly inefficient (and inelegant) to perform (naive) serialization of every object and value that I'm about to serialize smartly. The act of serializing a large object with many values is going to store a string and formatting result for EVERY unique value (not just leaf values) in the entire object.

Is there a modern JavaScript trick that would let me uniquely identify a value? For example, some way of accessing an internal ID, or otherwise associating complex objects with unique integers that takes O(1) time to find the identifier for a value?

Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • Very similar to (not quite a dup) of [JavaScript Object Id](http://stackoverflow.com/q/2020670/405017). Not quite a dup because I need to find a unique representation for most kind of value (string, boolean, number, array, object), not just objects. – Phrogz Feb 09 '16 at 16:32
  • Do you wish to memoize by object value or reference? In other words: `var a = {b:1}; var c = memoizedFn(a); a.b = 2; var d = memoizedFn(a);` should the second call use the memoized value? – Tamas Hegedus Feb 09 '16 at 16:45
  • @TamasHegedus By reference is completely sufficient. – Phrogz Feb 09 '16 at 16:48

3 Answers3

6

If you are looking to memoise your objects by identity (not by content), then you'll want to use a WeakMap which is designed for exactly this purpose. They don't work for primitive values though, so you'll need a different solution for such arguments.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • That looks very helpful. And yes, in this case memoization by identity is sufficient, since I'm just crawling up and down values in the same tree. – Phrogz Feb 09 '16 at 16:47
  • Since I need to support primitive values, also, it looks like a [`Map`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map) is more appropriate than a `WeakMap`. – Phrogz Feb 09 '16 at 20:39
  • @Phrogz: Yeah, depends on the use case. A `WeakMap` could have better memory efficiency if some of the object can already be garbage-collected while you still hold onto (or run) the memoised function. – Bergi Feb 09 '16 at 20:46
  • 1
    I'm going to give you the accept, because without your WeakMap link I'd not have found Map. Also, WeakMap matches the title of the question (objects) if not the actual need I was attempting to describe of serializing any value. I encourage others to look at my answer to this question for the actual solution I ended up with. – Phrogz Feb 10 '16 at 21:09
  • This library seems to use the approach described (memoizing using Maps/WeakMaps -- depending on what type of argument is passed): https://github.com/StarryInternet/map-memo – Venryx Apr 24 '20 at 01:16
  • I ended up finding several other libraries that make use of WeakMap actually: https://stackoverflow.com/a/61402805/2441655 (see entries with "o-hash" checked) – Venryx Apr 24 '20 at 07:42
4

Using @Bergi's suggestion of a WeakMap I found out about Map, which allows using any value type as the key (not just objects). Because I needed a compound key—uniquely memoizing the combination of the value passed in and the indentation string—I created a hierarchical memoization structure:

function memoizedBuild(){
  var memo = new Map;
  return function(value,indent){
    var byIndent=memo.get(value);
    if (!byIndent) memo.set(value,byIndent={});
    if (!byIndent[indent]) byIndent[indent] = rawBuild(value,indent);
    return byIndent[indent];
  }
}

This proved to be about 4× faster than the memoization code I had been using when serializing a large 270kB JSON object.

Note that in the above code I'm able to use !byIndent[indent] only because I know that rawBuild will never return a falsey value (null, undefined, false, NaN, 0, ""). The safer code line would look something like:

if (!(indent in byIndent)) byIndent[indent] = rawBuild(value,indent);
Phrogz
  • 296,393
  • 112
  • 651
  • 745
1

If you just need to memoise objects then it makes sense to assign some unique ID to your objects .

var gID = 0;

function createNode() {
  var obj = ...
  obj.id = (++gID).toString();
}

and use those obj.id's as keys in your memo collection.

That would be fastest and least greedy solution.

Update:

If you want that id property to do not clash with existing properties then you can create non-enumerable properties using standard ES5.1 Object.createProperty() (with some unique name) or to use ES6 symbols:

var gID = 0;
var gUidSym = Symbol("uid"); 

function getUidOf(obj) {
  return obj[gUidSym] 
      || (obj[gUidSym] = (++gID).toString());
}
c-smile
  • 26,734
  • 7
  • 59
  • 86
  • 1
    Two problems: 1) I don't control the objects that are created. This is a JSON serializer where people pass me their objects and I process them. 2) Because I'm serializing the object's properties, the `id` property you propose here will get included in the serialization (and possibly conflict with an existing property). – Phrogz Feb 09 '16 at 17:36
  • 1
    Check "Update:" part. – c-smile Feb 09 '16 at 18:05
  • 1
    Creating properties (whether string- or symbol-named) has the problem that it doesn't work on sealed objects. – Bergi Feb 09 '16 at 20:48