7

Behind the scenes, in V8, are JavaScript-Map-object's keys indexed in some manner that optimizes the map.get method? Or does map.get() simply just loop through the entire map until it discovers a key match?

I'm interested in the efficiency of map.get on larger mappings of 500,000+ key/value pairs. I have this many mappings that I'd like to just cache in RAM, instead of querying a database where the keys are already indexed for fast value-retrieval. It seems to me, that querying RAM instead of a database would be faster if the Map object's keys are somehow indexed behind the scenes.

Abstract:

function randomUniqueThing()
{
   // returns (magically) a unique random: 
   //  string, number, function, array or object.
}
var objMap = new Map();
var count = 0;
var thing1,thing2;
while(count < 500000)
{
    thing1 = randomUniqueThing();
    thing2 = randomUniqueThing();
    objMap.set(thing1, thing2);
    count++;
}
var lastValue = objMap.get(thing1); // Will getting this last
                                    // thing's value take longer
                                    // than getting other values?
Lonnie Best
  • 9,936
  • 10
  • 57
  • 97

1 Answers1

8

Yes, as you would expect from such a data type, a Map does utilize a hash-table under the hood.

Proof by source:

As always, the proof is in the source:

Excerpt from V8 source file src/objects.h class JSMap:

// The JSMap describes EcmaScript Harmony maps
class JSMap : public JSCollection {
 public:
  DECLARE_CAST(JSMap)

  static void Initialize(Handle<JSMap> map, Isolate* isolate);
  static void Clear(Handle<JSMap> map);

  // Dispatched behavior.
  DECLARE_PRINTER(JSMap)
  DECLARE_VERIFIER(JSMap)

 private:
  DISALLOW_IMPLICIT_CONSTRUCTORS(JSMap);
};

As we can see, JSMap extends JSCollection.

Now if we take a look at the declaration for JSCollection:

Excerpt from V8 source file src/objects.h class JSCollection:

class JSCollection : public JSObject {
 public:
  // [table]: the backing hash table
  DECL_ACCESSORS(table, Object)

  static const int kTableOffset = JSObject::kHeaderSize;
  static const int kSize = kTableOffset + kPointerSize;

 private:
  DISALLOW_IMPLICIT_CONSTRUCTORS(JSCollection);
};

Here we can see that it uses a hash table, with a nice comment to clarify it.


There was some question as to if that hash table refers only to object properties, and not to the get method. As we can from the source code to Map.prototype.get, a hash map is being used.

Excerpt from V8 source file src/js/collection.js MapGet:

function MapGet(key) {
  if (!IS_MAP(this)) {
    throw MakeTypeError(kIncompatibleMethodReceiver,
                        'Map.prototype.get', this);
  }
  var table = %_JSCollectionGetTable(this);
  var numBuckets = ORDERED_HASH_TABLE_BUCKET_COUNT(table);
  var hash = GetExistingHash(key);
  if (IS_UNDEFINED(hash)) return UNDEFINED;
  var entry = MapFindEntry(table, numBuckets, key, hash);
  if (entry === NOT_FOUND) return UNDEFINED;
  return ORDERED_HASH_MAP_VALUE_AT(table, entry, numBuckets);
}

Excerpt from V8 source file src/js/collection.js MapFindEntry:

function MapFindEntry(table, numBuckets, key, hash) {
  var entry = HashToEntry(table, hash, numBuckets);
  if (entry === NOT_FOUND) return entry;
  var candidate = ORDERED_HASH_MAP_KEY_AT(table, entry, numBuckets);
  if (key === candidate) return entry;
  var keyIsNaN = NumberIsNaN(key);
  while (true) {
    if (keyIsNaN && NumberIsNaN(candidate)) {
      return entry;
    }
    entry = ORDERED_HASH_MAP_CHAIN_AT(table, entry, numBuckets);
    if (entry === NOT_FOUND) return entry;
    candidate = ORDERED_HASH_MAP_KEY_AT(table, entry, numBuckets);
    if (key === candidate) return entry;
  }
  return NOT_FOUND;
}

Proof by benchmarking:

There is another way to test if it uses a hash map. Make many entries, and test what the longest and shortest lookup times are. Something like this:

'use strict';

let m = new Map();
let a = [];

for (let i = 0; i < 10000000; i++) {
    let o = {};
    m.set(o, i);
    a.push(o);
}

let lookupLongest = null;
let lookupShortest = null;

a.forEach(function(item) {
    let dummy;
    let before = Date.now();
    dummy = m.get(item);
    let after = Date.now();
    let diff = after - before;
    if (diff > lookupLongest || lookupLongest === null) {
        lookupLongest = diff;
    }
    if (diff < lookupShortest || lookupShortest === null) {
        lookupShortest = diff;
    }
});
console.log('Longest Lookup Time:', lookupLongest);
console.log('Shortest Lookup Time:', lookupShortest);

After a few seconds, I get the following output:

$ node test.js
Longest Lookup Time: 1
Shortest Lookup Time: 0

Such close lookup times would certainly not be possible if it where looping though every entry.

Alexander O'Mara
  • 58,688
  • 18
  • 163
  • 171
  • Because of this, would you expect map.get to be faster than querying an indexed database-table for any of these same 500,000 key-values? – Lonnie Best Dec 17 '15 at 07:16
  • 1
    @LonnieBest I would definitely expect that to the case, since they would already be available in memory. It might be possible that the database has a faster lookup, depending on the data, but I imagine the latency would outweigh this. When in doubt, benchmark it! – Alexander O'Mara Dec 17 '15 at 07:16
  • Without the index/hash (behind the scenes) would you still expect RAM to be faster than querying an indexed database-table? Indexed-keys in databases return values super fast even in very large datasets. I see your edit... – Lonnie Best Dec 17 '15 at 07:24
  • 1
    @LonnieBest Not if the data was very large. Speed when querying large data sets comes from binary searching and/or hash tables. If you had to loop over an unsorted array of thousands of results, it could easily take longer than a binary search in a database of indexed data. – Alexander O'Mara Dec 17 '15 at 07:26
  • 1
    "*In fact, Map is a subclass of Object.*" - of course it is, as all map objects are objects. But this only does seem to suggest that using properties (with `.` or `[]`) on a Map is implemented using a hashmap. What about `.get()` and `.set()`? Forgive me if the C++ code says more about that. – Bergi Dec 17 '15 at 17:50
  • 2
    @Bergi Fair enough. See my updated answer for more on the `get` implementation. – Alexander O'Mara Dec 17 '15 at 19:29
  • @Bergi : This question may have been asked before, but Alexander's answer here is excellent (perhaps canonical). – Lonnie Best Dec 20 '15 at 12:39