4

I understand that multiplication by a large number before xoring should help with badly distributed operands but why should the multiplier be a prime?

Related:
Why should hash functions use a prime number modulus?

Close, but not quite a Duplicate:
Why does Java’s hashCode() in String use 31 as a multiplier?

Community
  • 1
  • 1
gkdm
  • 2,375
  • 4
  • 21
  • 27
  • 2
    I don't really have an answer here (my honest one would be "because Josh Bloch says so!") but http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx makes for interesting reading. – Jon Skeet Sep 28 '09 at 19:50
  • Duplicate: http://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus – Andrew Hare Sep 28 '09 at 19:51
  • 1
    Why is this closed? A factor and a modulus are clearly not the same thing. – Accipitridae Sep 30 '09 at 16:59
  • I agree, i am not satisfied with the answers on the duplicate question thread or on this one. This might be just because i don't yet understand the answers. If someone will provide further clarification it will be much appreciated. – gkdm Sep 30 '09 at 20:49
  • 1
    I haven't seen a good reason to multiply with a prime either. I don't think there is one. – Accipitridae Oct 03 '09 at 07:39

4 Answers4

4

There's a good article on the Computing Life blog that discusses this topic in detail. It was originally posted as a response to the Java hashCode() question I linked to in the question. According to the article:

Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.

Given a string “Samuel”, you can generate a unique hash by multiply each of the constituent digits or letters with a prime number and adding them up. This is why primes are used.

However using primes is an old technique. The key here to understand that as long as you can generate a sufficiently unique key you can move to other hashing techniques too. Go here for more on this topic about hashes without primes.

Community
  • 1
  • 1
Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
2

Multiplying by a non-prime has a cyclic repeating pattern much smaller than the number. If you use a prime then the cyclic repeating pattern is guaranteeed to be at least as large as the prime number.

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • 1
    Unfortuantely this is not correct. You need a generator of the multiplicative group to get a maximal cycle. This has nothing to do with primality. – Accipitridae Sep 30 '09 at 17:03
1

I'm not sure exactly which algorithm you're talking about, but typically the constants in such algorithms need to be relatively prime. Otherwise, you get cycles and not all the possible values show up in the result.

The number probably doesn't need to be prime in your case, only relatively prime to some other numbers, but making it prime guarantees that. It also covers the cases where the other magic numbers change.

For example, if you are talking about taking the last bits of some number, then the multiplier needs to not be a multiple of 2. So, 9 would work even though it's not prime.

UncleO
  • 8,299
  • 21
  • 29
0

Consider the simplest multiplication: x2.

It is equivalent to a left-bitshift. In other words, it really didn't "randomize" the data, it just shifted it over.

Same with x4, or any power of two. The original data is intact, just shifted.

Now, multiplication by other numbers (non-powers of two) are not as obvious, but still have the same problem, more or less. The original data is intact, or trivially transformed. (eg. x5 is the same as left-bitshift two places, then add on the original data).

The point of GetHashCode is to essentially distribute the data as randomly as possible. Multiplying by a prime number guarantees that the answer won't be a simpler transform like bit-shifting or adding a number to itself.

abelenky
  • 63,815
  • 23
  • 109
  • 159
  • 2
    @abelenky: One of the more common hashes of a string `s` is `31 * s[0] + 31^2 * s[1] + ... + 31^(n - 1) * s[n-1]`. `31` is prime. Multiplication by `31` is the same as a bit-shift and a subtraction (i.e., `a * 31 = (a << 5) - a`). That is rather simple. All of this is to point out that the reason that primes are used is not to just jumble up the data. – jason Sep 28 '09 at 20:02
  • 1
    This hash but with 33 as a multiplier is known as djb hash. Since Bernstein is an expert in number theory it is certainly not an accident that he doesn't use a prime. – Accipitridae Sep 30 '09 at 17:30
  • 1
    @Accipitridae: A major weakness of using 31 is a prime is that the hash contribution from a pair of matching consecutive characters will be a multiple of 32, effectively losing five bits. Using 33 would make the contribution be a multiple of 34, which would only lose one bit. – supercat Feb 21 '15 at 21:49