Looking to learn how to implement a hash table in a decent way in JavaScript.
I would like for it to be able to:
- Efficiently resolve collisions,
- Be space efficient, and
- Be unbounded in size (at least in principle, like v8 objects are, up to the size of the system memory).
From my research and help from SO, there are many ways to resolve collisions in hash tables. The way v8 does it is Quadratic probing:
The wikipedia algorithm implementing quadratic probing in JavaScript looks something like this:
var i = 0
var SIZE = 10000
var key = getKey(arbitraryString)
var hash = key % SIZE
if (hashtable[hash]) {
while (i < SIZE) {
i++
hash = (key + i * i) % SIZE
if (!hashtable[hash]) break
if (i == SIZE) throw new Error('Hashtable full.')
}
hashtable[hash] = key
} else {
hashtable[hash] = key
}
The elements that are missing from the wikipedia entry are:
- How to compute the hash
getKey(arbitraryString)
. Hoping to learn how v8 does this (not necessarily an exact replica, just along the same lines). Not being proficient in C it looks like the key is an object, and the hash is a 32 bit integer. Not sure if the lookup-cache.h is important. - How to make it dynamic so the
SIZE
constraint can be removed. - Where to store the final
hash
, and how to compute it more than once.
V8 allows you to specify your own "Shape" object to use in the hash table:
// The hash table class is parameterized with a Shape.
// Shape must be a class with the following interface:
// class ExampleShape {
// public:
// // Tells whether key matches other.
// static bool IsMatch(Key key, Object* other);
// // Returns the hash value for key.
// static uint32_t Hash(Isolate* isolate, Key key);
// // Returns the hash value for object.
// static uint32_t HashForObject(Isolate* isolate, Object* object);
// // Convert key to an object.
// static inline Handle<Object> AsHandle(Isolate* isolate, Key key);
// // The prefix size indicates number of elements in the beginning
// // of the backing storage.
// static const int kPrefixSize = ..;
// // The Element size indicates number of elements per entry.
// static const int kEntrySize = ..;
// // Indicates whether IsMatch can deal with other being the_hole (a
// // deleted entry).
// static const bool kNeedsHoleCheck = ..;
// };
But not sure what the key is and how they convert that key to the hash so keys are evenly distributed and the hash function isn't just a hello-world example.
The question is, how to implement a quick hash table like V8 that can efficiently resolve collisions and is unbounded in size. It doesn't have to be exactly like V8 but have the features outlined above.
In terms of space efficiency, a naive approach would do var array = new Array(10000)
, which would eat up a bunch of memory until it was filled out. Not sure how v8 handles it, but if you do var x = {}
a bunch of times, it doesn't allocate a bunch of memory for unused keys, somehow it is dynamic.
I'm stuck here essentially:
var m = require('node-murmurhash')
function HashTable() {
this.array = new Array(10000)
}
HashTable.prototype.key = function(value){
// not sure if the key is actually this, or
// the final result computed from the .set function,
// and if so, how to store that.
return m(value)
}
HashTable.prototype.set = function(value){
var key = this.key(value)
var array = this.array
// not sure how to get rid of this constraint.
var SIZE = 10000
var hash = key % SIZE
var i = 0
if (array[hash]) {
while (i < SIZE) {
i++
hash = (key + i * i) % SIZE
if (!array[hash]) break
if (i == SIZE) throw new Error('Hashtable full.')
}
array[hash] = value
} else {
array[hash] = value
}
}
HashTable.prototype.get = function(index){
return this.array[index]
}