22

Why does Google sparsehash open-source library has two implementations: a dense hashtable and a sparse one?

ganit44
  • 517
  • 1
  • 4
  • 16
Denis Gorodetskiy
  • 2,884
  • 3
  • 21
  • 23
  • I think i'm misunderstanding the question in the post. Wouldn't sparse hashtables + dense hashtables == all hashtables? And if so, then why is the library called "sparsehash"? – cHao Mar 13 '11 at 12:22
  • 3
    BTW: [documentation from Google Code](http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html). – cHao Mar 13 '11 at 12:28

2 Answers2

21

The dense hashtable is your ordinary textbook hashtable implementation.

The sparse hashtable stores only the elements that have actually been set, divided over a number of arrays. To quote from the comments in the implementation of sparse tables:

// The idea is that a table with (logically) t buckets is divided
// into t/M *groups* of M buckets each.  (M is a constant set in
// GROUP_SIZE for efficiency.)  Each group is stored sparsely.
// Thus, inserting into the table causes some array to grow, which is
// slow but still constant time.  Lookup involves doing a
// logical-position-to-sparse-position lookup, which is also slow but
// constant time.  The larger M is, the slower these operations are
// but the less overhead (slightly).

To know which elements of the arrays are set, a sparse table includes a bitmap:

// To store the sparse array, we store a bitmap B, where B[i] = 1 iff
// bucket i is non-empty.  Then to look up bucket i we really look up
// array[# of 1s before i in B].  This is constant time for fixed M.

so that each element incurs an overhead of only 1 bit (in the limit).

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
3

sparsehash are a memory-efficient way of mapping keys to values (1-2 bits per key). Bloom filters can give you even fewer bits per key, but they don't attach values to keys other than outside/probably-inside, which is slightly less than a bit of information.

Tobu
  • 24,771
  • 4
  • 91
  • 98