9

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.

For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?

For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?


Did I interpret it right in this class:

public MyClass{
    long a, b, c; // these are the only fields
    //some code and methods
    public int hashCode(){
        return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32))) 
                 + (int) (c ^ (c >>> 32));
    }
}
Community
  • 1
  • 1
Justin
  • 24,288
  • 12
  • 92
  • 142
  • 2
    FYI… As of Java 7, you can use the built-in [`Objects.hash`](https://docs.oracle.com/javase/7/docs/api/java/util/Objects.html#hash(java.lang.Object...)) method. Example: `@Override public int hashCode() { return Objects.hash( a , b , c ) ; }` – Basil Bourque Sep 21 '18 at 21:50

2 Answers2

19
  1. The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
  2. You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().

Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + (int) (a ^ (a >>> 32));
    result = prime * result + (int) (b ^ (b >>> 32));
    result = prime * result + (int) (c ^ (c >>> 32));
    return result;
}
Adam Siemion
  • 15,569
  • 7
  • 58
  • 92
  • 1
    What sort of algorithm is good? Is the one in the example good? Should I use different primes for each element? – Justin May 18 '13 at 23:15
  • I understand your #1, but it is better to have fewer collisions. – Justin May 18 '13 at 23:16
  • Any code can be anything, but to be *good* code, hadhCode can't shouldn't be "anything". See Object.hashCode() java doc – Bohemian May 19 '13 at 00:12
  • @Bohemian of course I agree, but I did say somewhere that it can be "anything"? – Adam Siemion May 19 '13 at 00:23
  • First line :). At least that's how i interpreted it. Perhaps you were talking about the multiplicands – Bohemian May 19 '13 at 00:32
  • @Bohemian ok, agreed, it might have been interpreted this way, I have updated it so that there is no confusion, I have also added a paragraph about the hashCode contract. Thank you for pointing it out :) – Adam Siemion May 19 '13 at 00:35
6

A well-behaved hashcode method already exists for long values - don't reinvent the wheel:

int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+

int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8

Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.

The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

Clashsoft
  • 11,553
  • 5
  • 40
  • 79
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Is this better than `int hashCode = 31 * (31 * ((int) (a ^ (a >>> 32)) + 31) + (int) (b ^ (b >>> 32))) + (int) (c ^ (c >>> 32))`? (`(int) (a ^ (a >>> 32))` == `Long.valueOf(a).hashCode()`) In other words, is it better to take the combination the Hash Codes of the values or the Hash Code of the combination of the values? – Justin May 18 '13 at 23:34
  • 5
    @gangqinlaohu don't know, but I don't have to know. I can guarantee that the JDK code for hashing is going to be better than anything you come up with. Those classes are rigorously tested and researched. Also, it's easier to read my code than yours. That alone is valuable. B) I believe it would be better to hash at the end (and it's less code) but I would accept a combination (probably just adding) of individual hashes. – Bohemian May 18 '13 at 23:39
  • If you just add, then `a = 3`, `b = 2`, `c = 13` would return the same hashCode as `a = 2`, `b = 3`, `c = 13` (and other similar values) – Justin May 18 '13 at 23:43
  • @gangqinlaohu: And if you use Bohemian's algorithm, a = 1, b = 31, c = 1 and a = 31, b = 1, c = 1 will also generate the same hash code. Most hash code implementations in the standard API classes are not implemented to create large variations in the hash code if there is only a minor change to the base value(s). – jarnbjo May 18 '13 at 23:58
  • @jarnbjo The two example values aren't the same, despite you choosing an edge case of a value the same as the multiplier (you can do the math yourself to check), and the JDK classes ARE designed to give huge variation in hash for tiny variation in value. Basically, you're completely wrong about everything you've said. – Bohemian May 19 '13 at 00:59