2

I review the source code of Arrays.hashCode(char[] c)
I am not very confirm that the algorithm it applies well work well in all cases.

    public static int hashCode(int a[]) {
    if (a == null)
        return 0;

    int result = 1;
    for (int element : a)
        result = 31 * result + element;

    return result;
}

Does the hash function implement here really uniformly distributes the all the input arrays.And Why we use prime 31 here .

andyqee
  • 3,175
  • 3
  • 21
  • 24
  • 1
    It doesn't work well in all cases. It works well in most cases, and that is the best you can hope for. You can create pathological scenarios for all hashCode() which is used in denial of service attacks. – Peter Lawrey Sep 13 '13 at 14:26
  • I ran across a super simple case which generates the same hashCode. While I know collisions are possible, I was shocked to actually run into one with such simple data. Arrays.hashCode(new double[] {0.0, 0.0}): 961 Arrays.hashCode(new double[] {2.0, 2.0}): 961 – PeterVermont Jan 27 '15 at 21:31

3 Answers3

6

Why use the prime number 31?

This can be split in two parts?

  1. Why a prime number?

Here we need to understand that our goal is to get a unique HashCode for an object which will help us to find that object in O(1) time.

The key word here, is unique.

Primes

Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.

.

Why number 31?

From Effective Java

  • Because it's an odd prime, and it's "traditional" to use primes.
  • It's also one less than a power of two, which permits for bitwise optimization

    Here's the full quote,

from Item 9: Always override hashCode when you override equals:

The value 31 was chosen because it's an odd prime. If it were even and multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

A nice property of 31 is that the multiplication can be replaced by a shift (§15.19) and subtraction for better performance:

31 * i == (i << 5) - i Modern VMs do this sort of optimization automatically.

While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and theoretical computer scientists.

Perhaps a later release of the platform will provide state-of-the-art hash functions for its classes and utility methods to allow average programmers to construct such hash functions. In the meantime, the techniques described in this item should be adequate for most applications.

This is a very Good source.

JNL
  • 4,683
  • 18
  • 29
1

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

TheEwook
  • 11,037
  • 6
  • 36
  • 55
1

See this post: Why does Java's hashCode() in String use 31 as a multiplier?

That's where TheEwook's answer is from.

Generally, you use primes because they don't have any factors and will distribute better modulo N where N is the size of the range you are binning into. 31 is a small, odd prime so it works well. However, as the various sources you will find on the Internets will indicate, a small prime like 31 may lead to more collisions than a larger prime (especially if the values being hashed are not well distributed to begin with), so you could pick a larger prime if you found the performance to not be as good as you'd like.

Community
  • 1
  • 1
reblace
  • 4,115
  • 16
  • 16
  • I didn't have the reputation points to comment on your post pointing out your source, but now I do. – reblace Sep 13 '13 at 14:36