What's the rationale behind this hash function?

Question

uint32_t h(const char *kbuf, int ksiz){
  uint32_t hash = 751;
  const char *rp = kbuf + ksiz;
  while(ksiz--){
    hash = (hash * 31) ^ *(uint8_t *)--rp;
  }
  return hash;
}

Why should the hash be calcuted this way,what's the rationale?

`because it works sufficiently?` I suppose by this logic we should not bother learning how to make software because it already works sufficiently. — SE Does Not Like Dissent, Oct 17 '11 at 10:51
@SSight3: Jeff's answer is appropriate - designing hash algorithms is something of a black art; there are some guiding principles, but it also involves playing around with constants until you arrive at something which *works sufficiently well* for your sample input — Christoph, Oct 17 '11 at 10:57
Any ideas about why it goes backwards? Using 'ksiz--' is an efficient way to get the right number of elements without introducing a new temp, but I can't see any advantage to 'rp--' instead of 'kbuf++'. — AShelly, Oct 18 '11 at 01:35

score 3 · Answer 1 · edited Feb 22 '14 at 17:18

Your hash algorithm follows the same idea which lead to the modified Bernstein hash and Fowler/Noll/Vo (see eg this overview of existing hash algorithms).

XORing bytes is a classical hash algorithm. However, the resulting distribution of hash values is far from optimal, thus, an additional mixing step (in this case multiplication by 31) was added.

Using 31 as multiplier is explained by Josh Bloch in Effective Java:

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

The choice of 33 as multiplier in the original Bernstein hash probably follows similar reasoning; however, if hashing performance is not your main concern, you're probably better off by choosing a multiplier which results in a better distribution. If you don't want to experiment yourself, Fowler/Noll/Vo is probably a good choice.

thank matt b for the quote - I got it from this answer: http://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier/299748#299748 — Christoph, Oct 17 '11 at 10:47

score 1 · Answer 2 · answered Oct 17 '11 at 10:43

This implementation attempts to make the input-to-output mapping uniform so that collisions are uniform too. In other words, it tries to avoid situations such that for some hash values there are way to many collisions than for others. It does that by mixing in values of a primitive pseudo-random generator into the resulting hash values. Or you could think of it the other way around, as a PRNG mixing in the input data. Prime numbers (751 and 31) help to achieve uniformity. Of course, there are no guarantees as with carefully chosen inputs you can defy these attempts.

For more details see these articles:
Hash Function - Uniformity
Linear congruential generator

What's the rationale behind this hash function?

2 Answers2