0

I generating a bunch of hash values for the same key using MurmerHash like the following. It outputs 50 different hash values for 50 different seeds.

for(int seed=0; seed<50; seed++)
  MurmurHash3_x86_32(key, strlen(key), seed, &hash);

But it is not so time efficient when I have a large number of keys i.e. 10 million keys. Is there any other way to make it faster?

viz12
  • 675
  • 1
  • 11
  • 20
  • mmh... multithreading? – YSC Feb 19 '18 at 17:23
  • Thanks !! I was trying if it can be done without multithreading. – viz12 Feb 19 '18 at 17:26
  • let the GPU handle it then – YSC Feb 19 '18 at 17:26
  • One thing you can to do speed things up is compute the result of strlen(key) once before the loop, and store it into a local variable, rather than needlessly recompute it on on every iteration of the loop. – Jeremy Friesner Feb 19 '18 at 17:41
  • @JesperJuhl look at the `seed` ;) – YSC Feb 19 '18 at 18:05
  • @JesperJuhl since `MurmurHash3_x86_32` takes as first parameter a `const void*`, any decent compiler would optimize this automatically since the memory pointed by `key` under the [strict-aliasing-rule](https://stackoverflow.com/q/98650/5470596) will stay constant. – YSC Feb 19 '18 at 18:08

1 Answers1

1

It's been a while, but saw this and figured I would contribute an answer for anyone else who needs it.

NOTE -- I can't remember where I read this, or who to attribute the formula to. If someone knows, please let me know so I can properly attribute this, and so that others can verify the correctness of this solution.

You can generate n number of independent hashes by combining 2 independent hashes.

for (i = 0; i < n; i++) {
    (hash1 + i * hash2) % max_hash_size;
}

Addition, multiplication, and modulation should be faster than generating an entirely new hash.

I use MurmurHash with two different seeds, and then combine them to generate the extra hash values that I need.

I wish I could remember where I read this....