Improving query access performance for unordered maps that are unchanging

Question

I am looking for suggestions in improving the query time access for unordered maps. My code essentially just consists of 2 steps. In the first step, I populate the unordered map. After the first step, no more entries are ever added to the map. In the second step, the unordered map is only queried. Since the map is essentially unchanging, is there something that can be done to speed up the query time? For instance, does stl provide any function that can adjust the internal allocations in the map to improve query time access? In other words, it is possible that more than one key was mapped to the same bucket in the unordered map. If more memory was allocated to the map, then chances of such a collision occurring can reduce. In that sense, I am curious as to whether there is anything that can be done knowing the fact that the unordered map will remain unchanged.

[Here](https://stackoverflow.com/questions/16068151/c-stl-map-is-access-time-o1) is a good reference for you to know about map access complexity. — muaz, Jun 05 '19 at 20:26
@muaz- thanks for the comment. I guess I wasn't very clear with my earlier version of the question. Please take a look at the updated version. And yes, I am aware of the average case complexity for access. — batwing, Jun 05 '19 at 21:25
map is a hash table so if you are using a customized class or struct as a key then having collision depends on your hash function, i.e. the allocated memory has nothing to do with collisions. — muaz, Jun 05 '19 at 22:28
@muaz: *"...collision depends on your hash function..."* - not so - hash functions typically return say a 32-bit or 64-bit value, which is then masked (bitwise-ANDed) or subjected to a `%`-modulo operation to map the hash value onto a specific hash bucket; the number of hash buckets - in C++ parlance, the `capacity()` - is what batwing would have been contemplating, and increased capacity (obviously decreases load factor and) tends to reduce collisions, as the same hash function will now be mapping some of the values post-masking or mod-ing to the extra buckets. — Tony Delroy, Jun 08 '19 at 12:42
Tony, if the hash function returns the same value for two different inputs, it doesn't matter how many buckets there are in the hash table, they're going to end up in the same bucket. — Marshall Clow, Jun 09 '19 at 04:24

score 0 · Answer 1 · answered Jun 06 '19 at 01:16

You have two knobs that you can twist: The the hash function and number of buckets in the map. One is fixed at compile-time (the hash function), and the other you can modify (somewhat) at run-time.

A good hash function will give you very few collisions (non-equal values that have the same hash value). If you have many collisions, then there's not really much you can do to improve your lookup times. Worst case (all inputs hash to the same value) gives you O(N) lookup times. So that's where you want to focus your effort.

Once you have a good hash function, then you can play games with the number of buckets (via rehash) which can reduce collisions further.

score 0 · Accepted Answer · answered Jun 08 '19 at 12:50

If measurements show this is important for you, then I'd suggest taking measurements for other hash table implementations outside the Standard Library, e.g. google's. Using closed hashing aka open addressing may well work better for you, especially if your hash table entries are small enough to store directly in the hash table buckets.

More generally, Marshall suggests finding a good hash function. Be careful though - sometimes a generally "bad" hash function performs better than a "good" one, if it works in nicely with some of the properties of your keys. For example, if you tend to have incrementing number, perhaps with a few gaps, then an identity (aka trivial) hash function that just returns the key can select hash buckets with far less collisions than a crytographic hash that pseudo-randomly (but repeatably) scatters keys with as little as a single bit of difference in uncorrelated buckets. Identity hashing can also help if you're looking up several nearby key values, as their buckets are probably nearby too and you'll get better cache utilisation. But, you've told us nothing about your keys, values, number of entries etc. - so I'll leave the rest with you.

Improving query access performance for unordered maps that are unchanging

2 Answers2