Best implementation for hashCode method for a collection

Question

How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?

with Java 7+, I guess `Objects.hashCode(collection)` should be a perfect solution! — Diablo, Mar 17 '16 at 11:45
@Diablo I don't think that answers the question at all - that method simply returns `collection.hashCode()` (http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/java/util/Objects.java#l95) — cbreezier, Apr 19 '16 at 05:35

dmeister · Accepted Answer · 2018-04-02T16:19:52.883

468

The best implementation? That is a hard question because it depends on the usage pattern.

A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.

A short version

Create a int result and assign a non-zero value.
For every field f tested in the equals() method, calculate a hash code c by:
- If the field f is a boolean: calculate (f ? 0 : 1);
- If the field f is a byte, char, short or int: calculate (int)f;
- If the field f is a long: calculate (int)(f ^ (f >>> 32));
- If the field f is a float: calculate Float.floatToIntBits(f);
- If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
- If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
- If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
Combine the hash value c with result:
```
result = 37 * result + c
```
Return result

This should result in a proper distribution of hash values for most use situations.

edited Apr 02 '18 at 16:19

answered Sep 22 '08 at 07:22

dmeister

34,704
19
73
95

1

Interesting answer. Are there some link to a mathematical proof for why this formula works? – runaros Sep 22 '08 at 08:50
52

Yeah I'm particularly curious about where the number 37 comes from. – Kip Sep 22 '08 at 17:25
11

I'm not aware of any proof. The number of 37 is arbitrary, but it should be prime. Why? I'm not really sure but it has to do with modulo arthritics and properties of prime numbers which lead to go distributions. – dmeister Sep 22 '08 at 23:55
1

Thank you for detailed answer. Could you please tell me, which reference/guidelines you used to create this set of advices? Also I wonder, why multiply by 37? What is the magic in 37? Why multiply is better then right shift (`result = result << 16 + c`)? – dma_k Oct 04 '10 at 13:57
20

I used item 8 of Josh Bloch's "Effective Java" book. – dmeister Oct 04 '10 at 14:39
43

@dma_k The reason for using prime numbers and the method described in this answer is to ensure that the **computed hashcode will be unique**. When using non-prime numbers, you cannot guarantee this. It does not matter which prime nummer you choose, there is nothing magical about the number 37 (too bad 42 isn't a prime number, eh?) – Simon Forsberg Feb 15 '13 at 13:58
38

@SimonAndréForsberg Well, computed hash code cannot be always unique :) Is a hashcode. However I got the idea: the prime number has only one multiplier, while non-prime has at least two. That creates an extra combination for multiplication operator to result the same hash, i.e. cause collision. – dma_k Feb 15 '13 at 14:08
@dma_k Yes, that's what I meant :) Of course it cannot be unique at all times (otherwise comparing hashCode would be enough to determine .equals), using prime numbers just reduces the collisions. – Simon Forsberg Feb 15 '13 at 15:46
3

Any particular reason for `true` mapping to `0` and `false` to `1`, and not the other way around? – afsantos Mar 18 '13 at 12:01
2

@afsantos: I can't think of any. The code from the book is more a general guideline as a law and by no means the only correct way to do it. – dmeister Mar 18 '13 at 14:45
I edited the post but this minimum number of characters is really annoying when doing minor mark-up adjustments :( If accepted, could somebody please remove EDITED near bullet 1? Thanks. – Timo Nov 12 '13 at 08:07
2

+1 Excellent explanation, and I really like that you gave the source (i.e. Bloch's book, not the source code) in the event someone wants more information. – jwir3 Jan 27 '14 at 21:07
If the field `f` is an array, you also have the option of using the `java.util.Arrays.hashCode` methods to compute that field's hash value. – Brigham Aug 26 '14 at 17:59
18

I think [Bloch multiplies by 31, not 37](http://stackoverflow.com/a/5155015/1028230), for [its ease of optimization](http://stackoverflow.com/questions/5154970/how-do-i-create-a-hashcode-in-net-c-for-a-string-that-is-safe-to-store-in-a/5155015?noredirect=1#comment51056393_5155015). – ruffin Jul 21 '15 at 20:27
What if the field is a string? It definitely is an object but what should be the implementation of hte hashcode – inquisitive Sep 21 '15 at 05:36
How to calculate in the case of String Member type of object – Pankaj Feb 12 '16 at 11:58
1

One of the main goals of a hash function is that it should be inexpensive to evaluate. I just wrote a hash function for a formal parameter data structure in a compiler. The equality test of course must check every parameter name and type, after ensuring structural isometry. But as for "For every field f tested in the equals() method", one could well find the hash lookup nearly as expensive as a linear search. You need only to create a trivial hash function that yields a broad distribution of results across usage patterns. Empirical measurement and experimentation, rather than guessing, helps. – Cope Mar 04 '16 at 11:43
@Cope: I disagree that the main goal of a hash function is speed. While there are certain special cases for other hashCode implementations, this is a very good start. Certainly it is a correct implementation. Most "smart" ones are just plain wrong. – dmeister Mar 05 '16 at 00:21
2

Hej @dmeister — I stand my ground. It's very simple. 1. The primary goal of a hash function (in data structures, rather than, e.g., in encryption) is speed — to avoid the cost of a linear search. 2. There is a tradeoff between the cost of computing the hash and the cost of a linear search, and on multi-field structures even just searching on the right key can be enough to adequately lower the cost. 3. For large data structures (like my formal parameters) the number of computations would be in the dozens if you measure "every field." 4. There is no universal answer — so you must measure. – Cope Mar 06 '16 at 09:20
1

I saw that 37 elsewhere for calculating string hash code algorithm and judged that there are 26 letters and 10 numbers plus space character which comes up with 37 lol – yakya Mar 13 '16 at 17:19
1

Could someone please explain what this line is doing in case of long? f ^ (f >>> 32) – Abhishek Singh Jan 17 '18 at 04:42
In the third version of Effective Java, the content of **Item 8** has nothing to do with hashcode so maybe remove the mention of the item. – John R Perry Mar 25 '18 at 03:43
I think Item 8 was based on the second edition. The third edition has a page at the end how the different items map from edition to edition. – dmeister Apr 02 '18 at 16:19
1

With good hashing, the probability for hash collision should be 2^-32, but with 37, you can easily run into collisions. For example, for an object with two ints (x, y), the hashcodes for (1, -37), (0, 0), (-1, 37) would all evaluate to 0. So in real life (with small values much more likely than large), the probability is nowhere near 2^-32. Better use a prime like 912376189. – kristjan Jun 06 '18 at 13:49
This function handles `double` like .Net, and my recent testing shows that it is very poor when your doubles are not truly random, but really integers, or have only one or two decimal places. The re-hash function from `HashMap` helps, but still isn't as good as just using the HashCode for `int` on the value in this case. – NetMage May 17 '19 at 00:37
@demister Can you explain the logic of the long data type hash code `(int)(f ^ (f >>> 32))` – jashgopani Mar 18 '21 at 11:03
@Kip It is common practice to use powers of 2 to assign hash buckets; because it makes the modulus operation very fast, effectively turning it into a binary AND operation %4 -> AND( 0x0003). To get a decent distribution in the new space from a well distributed input, one wants to multiply the input by a coprime, of which 37 is coprime to all powers of 2. Other numbers would work, but you want one small enough to work with char, and large enough to overflow the limit quickly. 37 shifts the input by one, three, and six places, helping the input bits spread out more. 37, 31, 27 etc also work. – Edwin Buck Aug 20 '21 at 15:17
Another tip I found out the hard way. Don't use fields in hashCode() that change during the lifetime of the object. I use a WeakHashMap and there was an entry but after a while it returned NULL because one of the field changed. So the hashCode() was also changed, and the HashMap couldn't find the object anymore. – Stijn Leenknegt Feb 02 '22 at 18:25
@StijnLeenknegt mutable objects are generally not accepted as keys into hashmaps for this reason. If the object is mutated, then the hash will change, which is a no-no. In a sense, when you mutate the object it ceases to be the original object. At that point, you might be better off comparing the objects based on their id/address rather than based on their fields/attributes. – joseville Jul 13 '22 at 17:01

score 156 · Answer 2 · edited Nov 23 '22 at 21:44

156

If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:

@Override
public int hashCode() {
    return Objects.hash(this.firstName, this.lastName);
}

This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.

edited Nov 23 '22 at 21:44

chaotic3quilibrium

5,661
8
53
86

answered Aug 05 '13 at 19:49

bacar

9,761
11
55
75

9

Unless one has a good reason not to use these, one should definitely use these in any case. (Formulating it stronger, as it IMHO should be formulated.) The typical arguments for using standard implementations/libraries apply (best practices, well tested, less error prone, etc). – Kissaki Jan 28 '14 at 13:23
The case to not use a library implementation is when you have implemented a custom Object#equals(Object obj) method. See the Object#hashcode() JavaDoc, "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result." Using either of these library implementations will generate a"pseudo" memory address hash code which may violate the described requirement. – justin.hughey Mar 10 '14 at 16:35
8

@justin.hughey you seem to be confused. The only case you should override `hashCode` is if you have a custom `equals`, and that is precisely what these library methods are designed for. The documentation is quite clear on their behaviour in relation to `equals`. A library implementation does not claim to absolve you from knowing what the characteristics of a correct `hashCode` implementation are - these libraries make it *easier* for you to implement such a conforming implementation for the majority of cases where `equals` is overriden. – bacar Mar 11 '14 at 00:06
You're absolutely right. Thank you for pointing that out, I don't know where I was coming from on that one. :) – justin.hughey Mar 11 '14 at 12:42
7

For any Android developers looking at the java.util.Objects class, it was only introduced in API 19, so make sure you're running on KitKat or above otherwise you'll get NoClassDefFoundError. – Andrew Kelly Feb 05 '15 at 05:30
3

Best answer IMO, although by way of example I would rather have chosen the JDK7 `java.util.Objects.hash(...)` method than the guava `com.google.common.base.Objects.hashCode(...)` method. I think most people would choose the standard library over an extra dependency. – Malte Skoruppa Nov 04 '15 at 19:00
2

If there are two arguments or more and if any of them is an array, the result might be not what you expect because `hashCode()` for an array is just its `java.lang.System.identityHashCode(...)`. – starikoff Dec 21 '15 at 11:48
@starikoff Yes - the helper methods do not absolve you from having to understand what you're doing. They just help you to avoid certain mistakes and duplicate code. – bacar Dec 21 '15 at 13:31
@bacar Of course. I just left my comment so that whoever learns about these methods from this post learns about their limitations as well. – starikoff Dec 21 '15 at 14:17
The comments about association of overriding equals() are correct. hashCode() is not implemented in a vacuum and must be directly related to how equals() is implemented per the documentation (two objects can have the same hashCode but NOT be equal - yet two equal objects MUST have the same hashCode). – Darrell Teague Jun 03 '16 at 19:01
@Kissaki On the opposite: Using the standard implementation [leads to standard collisions](https://stackoverflow.com/a/47741808/581205). – maaartinus Dec 10 '17 at 18:03
@AndrewKelly It's been a few years but for anyone else who comes across this and is still supporting < 19, now there's `ObjectsCompat.hash(...)` and its implementation is just `Arrays.hashCode(...)` for API < 19, which appears to be the exact same implementation in `Objects.hash(...)`. Android Studio has a code generator for this that will insert `Objects.hash(...)` though no matter what your `minSdkVersion` is so devs should be aware they can just replace that with `ObjectsCompat` and they'll be fine. – mkuech Mar 27 '20 at 21:55
Objects.hashCode() is one of the APIs that is now desugared by D8 automatically in Android Studio 4+, so you can use the original API on targets earlier than API 19 (no need for the ObjectsCompat). For a full list of desugared APIs, check this file. https://jakewharton.com/static/files/d8_api_desugar_list.txt – Andrew Kelly Mar 29 '20 at 22:17

Christopher Rucinski · Answer 3 · 2017-08-05T15:01:42.530

Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.

@Override 
public int hashCode() {

    // Start with a non-zero constant. Prime is preferred
    int result = 17;

    // Include a hash for each field.

    // Primatives

    result = 31 * result + (booleanField ? 1 : 0);                   // 1 bit   » 32-bit

    result = 31 * result + byteField;                                // 8 bits  » 32-bit 
    result = 31 * result + charField;                                // 16 bits » 32-bit
    result = 31 * result + shortField;                               // 16 bits » 32-bit
    result = 31 * result + intField;                                 // 32 bits » 32-bit

    result = 31 * result + (int)(longField ^ (longField >>> 32));    // 64 bits » 32-bit

    result = 31 * result + Float.floatToIntBits(floatField);         // 32 bits » 32-bit

    long doubleFieldBits = Double.doubleToLongBits(doubleField);     // 64 bits (double) » 64-bit (long) » 32-bit (int)
    result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));

    // Objects

    result = 31 * result + Arrays.hashCode(arrayField);              // var bits » 32-bit

    result = 31 * result + referenceField.hashCode();                // var bits » 32-bit (non-nullable)   
    result = 31 * result +                                           // var bits » 32-bit (nullable)   
        (nullableReferenceField == null
            ? 0
            : nullableReferenceField.hashCode());

    return result;

}

EDIT

Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...

@Override
public boolean equals(Object o) {

    // Optimization (not required).
    if (this == o) {
        return true;
    }

    // Return false if the other object has the wrong type, interface, or is null.
    if (!(o instanceof MyType)) {
        return false;
    }

    MyType lhs = (MyType) o; // lhs means "left hand side"

            // Primitive fields
    return     booleanField == lhs.booleanField
            && byteField    == lhs.byteField
            && charField    == lhs.charField
            && shortField   == lhs.shortField
            && intField     == lhs.intField
            && longField    == lhs.longField
            && floatField   == lhs.floatField
            && doubleField  == lhs.doubleField

            // Arrays

            && Arrays.equals(arrayField, lhs.arrayField)

            // Objects

            && referenceField.equals(lhs.referenceField)
            && (nullableReferenceField == null
                        ? lhs.nullableReferenceField == null
                        : nullableReferenceField.equals(lhs.nullableReferenceField));
}

Android Documentation now does not include the above code anymore, so here is a cached version from the [Wayback Machine - Android Documentation (Feb 07, 2015)](http://web.archive.org/web/20150207205800/https://developer.android.com/reference/java/lang/Object.html) — Christopher Rucinski, Jul 18 '17 at 09:14

score 59 · Answer 4 · answered Nov 29 '08 at 06:30

59

It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.

answered Nov 29 '08 at 06:30

Warrior

39,156
44
139
214

4

+1 A good practical solution. dmeister's solution is more comprehensive, but I tend to forget to handle nulls when I try to write hashcodes myself. – Quantum7 Apr 07 '11 at 00:31
2

+1 Agree with Quantum7, but I would say it's also really good to understand what the Eclipse-generated implementation is doing, and where it gets its implementation details from. – jwir3 Jan 27 '14 at 21:05
18

Sorry but answers involving "functionality provided by [some IDE]" are not really relevant in the context of the programming language in general. There are dozens of IDEs and this does not answer the question... namely because this is more about algorithmic determination and directly associated to equals() implementation - something an IDE will know nothing about. – Darrell Teague Jun 03 '16 at 18:59

score 17 · Answer 5 · answered Sep 22 '08 at 07:08

First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:

Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)

Reflexivity: For all non-null references, a.equals(a)

Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)

Then make sure that their relation with hashCode respects the contact (from the same article):

Consistency with hashCode(): Two equal objects must have the same hashCode() value

Finally a good hash function should strive to approach the ideal hash function.

score 13 · Answer 6 · edited Aug 31 '17 at 13:49

13

about8.blogspot.com, you said

if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values

I cannot agree with you. If two objects have the same hashcode it doesn't have to mean that they are equal.

If A equals B then A.hashcode must be equal to B.hascode

but

if A.hashcode equals B.hascode it does not mean that A must equals B

edited Aug 31 '17 at 13:49

Attila

3,206
2
31
44

answered Sep 22 '08 at 08:47

5

If `(A != B) and (A.hashcode() == B.hashcode())`, that's what we call hash function collision. It's because hash function's codomain is always finite, while it's domain is usually not. The bigger the codomain is, the less often the collision should occur. Good hash function's should return different hashes for different objects with greatest possibility achievable given particular codomain size. It can rarely be fully guaranteed though. – Krzysztof Jabłoński Apr 29 '13 at 08:45
This should just be a comment to the above post to Grey. Good information but it does not really answer the question – Christopher Rucinski Sep 15 '15 at 11:55
Good comments but be careful about using the term 'different objects' ... because equals() and thus the hashCode() implementation are not necessarily about different objects in an OO context but are usually more about their domain model representations (e.g., two people can be considered the same if they share a country code and country ID - though these may be two different 'objects' in a JVM - they are considered 'equal' and having a given hashCode)... – Darrell Teague Jun 03 '16 at 19:04

score 7 · Answer 7 · edited May 07 '15 at 17:30

7

There's a good implementation of the Effective Java's hashcode() and equals() logic in Apache Commons Lang. Checkout HashCodeBuilder and EqualsBuilder.

edited May 07 '15 at 17:30

nbro

15,395
32
113
196

answered Sep 22 '08 at 08:35

Rudi Adianto

247
1
6

1

The downside of this API is you pay the cost of object construction every time you call equals and hashcode (unless your object is immutable and you precompute the hash), which can be a lot in certain cases. – James McMahon Feb 04 '12 at 01:35
this was my favorite approach, until recently. I have ran into StackOverFlowError while using a criteria for SharedKey OneToOne association. More over, `Objects` class provides `hash(Object ..args)` & `equals()` methods from Java7 on. These are recommended for any applications using jdk 1.7+ – Diablo Mar 17 '16 at 11:26
@Diablo I guess, your problem was a cycle in the object graph and then you're out of luck with most implementation as you need to ignore some reference or to break the cycle (mandating an `IdentityHashMap`). FWIW I use an id-based hashCode and equals for all entities. – maaartinus Dec 10 '17 at 19:12

score 7 · Answer 8 · edited May 07 '15 at 17:26

7

If you use eclipse, you can generate equals() and hashCode() using:

Source -> Generate hashCode() and equals().

Using this function you can decide which fields you want to use for equality and hash code calculation, and Eclipse generates the corresponding methods.

edited May 07 '15 at 17:26

nbro

15,395
32
113
196

answered Sep 22 '08 at 12:50

Johannes K. Lehnert

919
1
10
9

score 6 · Answer 9 · edited May 23 '17 at 12:02

Just a quick note for completing other more detailed answer (in term of code):

If I consider the question how-do-i-create-a-hash-table-in-java and especially the jGuru FAQ entry, I believe some other criteria upon which a hash code could be judged are:

synchronization (does the algo support concurrent access or not) ?
fail safe iteration (does the algo detect a collection which changes during iteration)
null value (does the hash code support null value in the collection)

score 4 · Answer 10 · answered Sep 22 '08 at 07:16

If I understand your question correctly, you have a custom collection class (i.e. a new class that extends from the Collection interface) and you want to implement the hashCode() method.

If your collection class extends AbstractList, then you don't have to worry about it, there is already an implementation of equals() and hashCode() that works by iterating through all the objects and adding their hashCodes() together.

   public int hashCode() {
      int hashCode = 1;
      Iterator i = iterator();
      while (i.hasNext()) {
        Object obj = i.next();
        hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
      }
  return hashCode;
   }

Now if what you want is the best way to calculate the hash code for a specific class, I normally use the ^ (bitwise exclusive or) operator to process all fields that I use in the equals method:

public int hashCode(){
   return intMember ^ (stringField != null ? stringField.hashCode() : 0);
}

score 3 · Answer 11 · answered Dec 21 '15 at 12:08

3

I use a tiny wrapper around Arrays.deepHashCode(...) because it handles arrays supplied as parameters correctly

public static int hash(final Object... objects) {
    return Arrays.deepHashCode(objects);
}

answered Dec 21 '15 at 12:08

starikoff

1,601
19
23

score 2 · Answer 12 · answered Sep 22 '08 at 07:06

2

@about8 : there is a pretty serious bug there.

Zam obj1 = new Zam("foo", "bar", "baz");
Zam obj2 = new Zam("fo", "obar", "baz");

same hashcode

you probably want something like

public int hashCode() {
    return (getFoo().hashCode() + getBar().hashCode()).toString().hashCode();

(can you get hashCode directly from int in Java these days? I think it does some autocasting.. if that's the case, skip the toString, it's ugly.)

answered Sep 22 '08 at 07:06

SquareCog

19,421
8
49
63

3

the bug is in the long answer by about8.blogspot.com -- getting the hashcode from a concatenation of strings leaves you with a hash function that is the same for any combination of strings that add up to the same string. – SquareCog Sep 22 '08 at 13:53
1

So this is meta-discussion and not related to the question at all? ;-) – Huppie Sep 22 '08 at 17:40
1

It's a correction to a proposed answer that has a fairly significant flaw. – SquareCog Sep 22 '08 at 22:13
This is a very limited implementation – Christopher Rucinski Sep 15 '15 at 11:58
Your implementation avoid the problem and introduces another one; Swapping `foo` and `bar` leads to the same `hashCode`. Your `toString` AFAIK does not compile, and if it does, then it's terrible inefficient. Something like `109 * getFoo().hashCode() + 57 * getBar().hashCode()` is faster, simpler and produces no needless collisions. – maaartinus Dec 10 '17 at 19:18

score 2 · Answer 13 · answered Sep 22 '08 at 07:15

2

As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...

answered Sep 22 '08 at 07:15

Olaf Kock

46,930
8
59
90

score 2 · Answer 14 · edited May 07 '15 at 17:32

2

Use the reflection methods on Apache Commons EqualsBuilder and HashCodeBuilder.

edited May 07 '15 at 17:32

nbro

15,395
32
113
196

answered Sep 22 '08 at 10:16

Vihung

12,947
16
64
90

1

If you are going to use this be aware that reflection is expensive. I honestly wouldn't use this for anything besides throw away code. – James McMahon Feb 04 '12 at 01:31

score 1 · Answer 15 · answered Sep 22 '08 at 07:20

any hashing method that evenly distributes the hash value over the possible range is a good implementation. See effective java ( http://books.google.com.au/books?id=ZZOiqZQIbRMC&dq=effective+java&pg=PP1&ots=UZMZ2siN25&sig=kR0n73DHJOn-D77qGj0wOxAxiZw&hl=en&sa=X&oi=book_result&resnum=1&ct=result ) , there is a good tip in there for hashcode implementation (item 9 i think...).

score 1 · Answer 16 · edited May 07 '15 at 17:28

1

I prefer using utility methods fromm Google Collections lib from class Objects that helps me to keep my code clean. Very often equals and hashcode methods are made from IDE's template, so their are not clean to read.

edited May 07 '15 at 17:28

nbro

15,395
32
113
196

answered Sep 22 '08 at 08:51

score 1 · Answer 17 · answered Dec 30 '16 at 13:18

Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.

I have not include any equals() implementation but in reality you will of course need it.

import java.util.Objects;

public class Demo {

    public static class A {

        private final String param1;

        public A(final String param1) {
            this.param1 = param1;
        }

        @Override
        public int hashCode() {
            return Objects.hash(
                super.hashCode(),
                this.param1);
        }

    }

    public static class B extends A {

        private final String param2;
        private final String param3;

        public B(
            final String param1,
            final String param2,
            final String param3) {

            super(param1);
            this.param2 = param2;
            this.param3 = param3;
        }

        @Override
        public final int hashCode() {
            return Objects.hash(
                super.hashCode(),
                this.param2,
                this.param3);
        }
    }

    public static void main(String [] args) {

        A a = new A("A");
        B b = new B("A", "B", "C");

        System.out.println("A: " + a.hashCode());
        System.out.println("B: " + b.hashCode());
    }

}

score 1 · Answer 18 · answered Dec 10 '17 at 18:02

The standard implementation is weak and using it leads to unnecessary collisions. Imagine a

class ListPair {
    List<Integer> first;
    List<Integer> second;

    ListPair(List<Integer> first, List<Integer> second) {
        this.first = first;
        this.second = second;
    }

    public int hashCode() {
        return Objects.hashCode(first, second);
    }

    ...
}

Now,

new ListPair(List.of(a), List.of(b, c))

and

new ListPair(List.of(b), List.of(a, c))

have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.

There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.

Using a prime is pointless as primes have no meaning in the ring Z/(2**32).

So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.

Using different multipliers in different places is helpful, but probably not enough to justify the additional work.

score 0 · Answer 19 · edited May 23 '17 at 12:10

0

When combining hash values, I usually use the combining method that's used in the boost c++ library, namely:

seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);

This does a fairly good job of ensuring an even distribution. For some discussion of how this formula works, see the StackOverflow post: Magic number in boost::hash_combine

There's a good discussion of different hash functions at: http://burtleburtle.net/bob/hash/doobs.html

edited May 23 '17 at 12:10

Community

1
1

answered Oct 03 '12 at 15:18

Edward Loper

15,374
7
43
52

1

This is a question about Java, not C++. – dano Oct 10 '17 at 19:52

Chris Carruthers · Answer 20 · 2008-09-22T07:28:31.520

For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.

public class Zam {
    private String foo;
    private String bar;
    private String somethingElse;

    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }

        if (obj == null) {
            return false;
        }

        if (getClass() != obj.getClass()) {
            return false;
        }

        Zam otherObj = (Zam)obj;

        if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
            if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
                return true;
            }
        }

        return false;
    }

    public int hashCode() {
        return (getFoo() + getBar()).hashCode();
    }

    public String getFoo() {
        return foo;
    }

    public String getBar() {
        return bar;
    }
}

The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.

Like SquareCog have already noticed. If hashcode is generated once from concatenation of two strings it is extremely easy to generate masses of collisions: `("abc"+""=="ab"+"c"=="a"+"bc"==""+"abc")`. It is severe flaw. It would be better to evaluate hashcode for both fields and then calculate linear combination of them (preferably using primes as coefficients). — Krzysztof Jabłoński, Apr 30 '13 at 06:34
@KrzysztofJabłoński Right. Moreover, swapping `foo` and `bar` produces a needless collision, too. — maaartinus, Dec 10 '17 at 19:25

Best implementation for hashCode method for a collection

20 Answers20

A short version

Linked

Related