How to create a fixed size (in terms of memory) hash table in cpp?

Question

I am dealing with lots of computations that based on some inputs generate a floating point. There seems to be fair amount of repetitive calls that can be optimised if I do cache these quantities. The problem is, I don't want to cache more than certain amount of memory usage (i.e. 10GiB). Is there a way to define that when I am defining a hash-map, or test that on the fly every time an element is being added to the map?

This appears to be a duplicate of [Is there any way to limit the size of an STL Map?](http://stackoverflow.com/questions/2711705/is-there-any-way-to-limit-the-size-of-an-stl-map). — Simon, Dec 23 '15 at 03:28
It sounds like you're asking for a look-up table with a *maximum* size. — juanchopanza, Dec 23 '15 at 03:54

score 3 · Accepted Answer · edited May 23 '17 at 11:58

One way to accomplish this would be to implement an LRU cache ("least recently used") which lets you always prune the element which has been unreferenced for the longest time. For certain usage patterns, that optimizes the probability that the next item needed will still be in the cache.

Designing an LRU cache used to be a common interview question (and maybe it still is), so there are a number of SO answers waiting to be found. This one looks to be fairly complete.

But that's not how I would do it. 10 GiB is lots of memory, and you can keep a lot of entries in that space. If you can eliminate overhead and keep twice as many cache entries, it may well be worth sacrificing the precision of an LRU cache. And eliminating the overhead might also cumulatively save you enough cycles to offset the cycles it costs to have a non-optimal cache pruning strategy. Anyway, LRU might not be optimal for your problem. Who knows?

With a good hash function and a big hash table, you can use an alternative strategy: just keep one entry with each hash value. The advantage of that solution is that it is almost zero overhead. There is no need for timestamps, or even for bucket list pointers. There is no need to trace through a chain to see if an entry is in the cache; the first hit is either the entry or the entry to be replaced. In practice, if your entries are not too very big, that could easily double the number of entries you have available.

For a slightly more precise solution, you can keep two entries per hash value, somewhat like a simplified cuckoo hash. Again, no storage overhead (as long as every hash value eventually gets used), and only slightly more lookup cost. (Really very little because both entries should be in the same memory cache line.) To get something like LRU behaviour, in this variant if the cache entry you find is the second of the entries for the hash value, you swap the two entries. (In fact, you swap the two entries after the first one is not the one you needed; then you either use or replace the entry which is now the first entry.)

The C++ standard library doesn't have a non-chained hash table, but the datastructure is so trivial that you almost don't need library support. What the library may have which would be useful is hash support. (See boost.hash for a bunch more hashes, particularly useful if your keys are std::pair or std::tuple.)

Other than that, all you really need is an array of key-value pairs, which can be sized to the amount of memory you want to throw at the cache. Initializing this array has one small wrinkle: it needs to be initialized to something, which will generally be whatever the default key and value constructors produce. But the actual slot corresponding to the default key either needs to have the correct value (which is not likely to be the default value), or it needs to be changed to a different key with a different hash value.

score 1 · Answer 2 · answered Dec 23 '15 at 03:31

1

Store a time stamp with each entry and keep an eye on the hash tables' capacity, when it approaches a boundary value, flush old entries.

answered Dec 23 '15 at 03:31

Paul Evans

27,315
3
37
54

And you intend to find the old entries how, exactly? Search through all entries? (Workable but absurdly slow) Maintain two indexes? (cool, but no standard container does so.) Some other datastructure? – rici Dec 23 '15 at 05:49

How to create a fixed size (in terms of memory) hash table in cpp?

2 Answers2