I need to generate hash keys for about N=100 million keys. From my study it appears that murmur3 (MurmurHash3_x86_32, see murmur3 hash) would be the fastest hashing function with best latency and small enough collision rate. The problem I am facing is that the function returns key as void *
. More specifically, the template is:
void MurmurHash3_x86_32 (const void *key, int len, uint32_t seed, void *out);
As my hash table size would be smaller than the largest hash it can generate, I need to get it into the table range [0, N-1]. The easiest solution seems to be to use %
operator. But as it is known to be a slow operator, I wonder if there is a quicker way to solve the problem.
One interesting suggestion I found was given Is there an alternative to using % (modulus) in C/C++? on StackOverflow itself. It suggests 'a power of two, the following works (assuming twos complement representation)':
return i & (n-1);
My issue with this is that on newer CPUs, it is sometimes (or is it most of the times?), the performance degrades at around sizes 2^n, IIRC, due to multiway cache lines. (This link provides an illustration regarding insertions Big Memory, Part 3.5: Google sparsehash!).
At the moment, the advantages of murmur3 seems to be nullified by hard-ware related issues and known inefficiency of %
operator. As performance is a constraint, I request for low latency and faster solutions to my requirement even if it is not MurmurHash3_x86_32.