5

I'm trying to figure out if it's possible to build a conformant, efficient implementation of modern C++'s std::unordered_map using techniques like Cuckoo Hashing, Hopscotch Hashing, and Robin Hood Hashing that allow for very compact tables, high load factors, and maintain high performance. What's special about these techniques is that they involve potentially moving some elements around to make room for others, rather than just chaining, or probing until an open slot is found (as in linear or quadratic probing) or.

From insert http://www.cplusplus.com/reference/unordered_map/unordered_map/insert/

  1. Iterator validity On most cases, all iterators in the container remain valid after the insertion. The only exception being when the growth of the container forces a rehash. In this case, all iterators in the container are invalidated.

  2. A rehash is forced if the new container size after the insertion operation would increase above its capacity threshold (calculated as the container's bucket_count multiplied by its max_load_factor).

  3. References to elements in the unordered_map container remain valid in all cases, even after a rehash.

And for erase http://www.cplusplus.com/reference/unordered_map/unordered_map/erase/

  1. Only the iterators and references to the elements removed are invalidated.

  2. The rest are unaffected.

  3. [c++14 only] The relative order of iteration of the elements not removed by the operation is preserved.

The requirement that other references generally remain valid in both operations would seem to require a probing scheme involving evictions to work on a table structure that allocated nodes separate from the array and pointed at them instead. Perhaps the implementation could keep a separate array of elements that entries in the table itself could index into, to avoid extra dynamic allocation, but that still adds extra indirection. Is there a more efficient way to address that requirement?

The requirement that element references remain valid across insert even after a rehash seems to imply that either dynamic node allocation or something like the above indirection array is required even for a chaining or non-moving open addressing design. Is that right?

In general, do the requirement placed on unordered_map by the standard enforce indirection or some other sort of inefficiency in hash table implementation?

ildjarn
  • 62,044
  • 9
  • 127
  • 211
Phil Miller
  • 36,389
  • 13
  • 67
  • 90
  • Looks like this was potentially answered here: http://stackoverflow.com/questions/31112852/how-stdunordered-map-is-implemented – Phil Miller May 25 '16 at 05:11
  • 1
    Don't forget [bucket iterators](http://stackoverflow.com/questions/21518704/how-does-c-stl-unordered-map-resolve-collisions/21519560#21519560) and related fun. – T.C. May 25 '16 at 07:03
  • I suspect that the answer is "no," at least without some sort of horrid indirection on the iterators. Also, I doubt you could get away with cuckoo hashing for unordered_map, since cuckoo hashing requires families of hash functions, which is a different model than having "a" hash function per type. – templatetypedef May 25 '16 at 20:14
  • 1
    AFAIU, it's possible to construct suitable independent hashes given one good one. The indirection is clearly a blocker, and it turns out even the original proposal for adding unordered_map to the standard said as much. Implementations were assumed to use chaining from the start. – Phil Miller May 26 '16 at 00:17

0 Answers0