2

I usually let Eclipse generate the hashCode() method for me, but now I found signs that the generated hash code may not be very good.

Using the hash code returned by the Eclipse-generated hashCode() method in a hash set results in about 6 times slower lookups than when using a hand coded hashCode() method.

Here is my test:

  • A class with three int fields.
  • The hashCode() and equals() methods has been generated by Eclipse.
  • Fill a HashSet with 1.000.000 instances of the class.
  • Look up each instance in the HashSet
  • Repeat the look up 10 times.

The code:

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class MyPojo {

    private final int a;
    private final int b;
    private final int c;

    public MyPojo(int a, int b, int c) {
        super();
        this.a = a;
        this.b = b;
        this.c = c;
    }

    public static void main(String[] args) {

        List<MyPojo> listOfPojos = new ArrayList<MyPojo>();
        Set<MyPojo> setOfPojos = new HashSet<MyPojo>();

        for (int countA = 0; countA < 100; countA++) {
            for (int countB = 0; countB < 100; countB++) {
                for (int countC = 0; countC < 100; countC++) {
                    MyPojo myPojo = new MyPojo(countA, countB, countC);
                    listOfPojos.add(myPojo);
                    setOfPojos.add(myPojo);
                }
            }
        }

        long startTime = System.currentTimeMillis();

        for (int count = 0; count < 10; count++) {
            for (MyPojo myPojo : listOfPojos) {
                if (!setOfPojos.contains(myPojo)) {
                    throw new RuntimeException();
                }
            }
        }

        long endTime = System.currentTimeMillis();
        System.out.format("Execution time: %3f s", (endTime - startTime) / 1000.0);

    }

    // Generated by Eclipse
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + a;
        result = prime * result + b;
        result = prime * result + c;
        return result;
    }

    // Generated by Eclipse
    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        MyPojo other = (MyPojo) obj;
        if (a != other.a)
            return false;
        if (b != other.b)
            return false;
        if (c != other.c)
            return false;
        return true;
    }

}

On my machine this results in an execution time of about 1.23 seconds.

Now replace the hashCode() method with this one that I found elsewhere:

@Override
public int hashCode() {
    final int magic = 0x9e3779b9;
    int seed = 0;
    seed ^= this.a + magic + (seed << 6) + (seed >> 2);
    seed ^= this.b + magic + (seed << 6) + (seed >> 2);
    seed ^= this.c + magic + (seed << 6) + (seed >> 2);
    return seed;
}

Now the execution time is only 0.2 seconds, about 6 times faster!

Why?


Edit:

As suggested, counting the number of "re-occurrencies" of hash values gives the following:

With the Eclipse generated hashCode() method:

1: 62
2: 62
3: 1406
4: 440
5: 62
6: 1406
7: 62
8: 440
9: 52094
10: 4670
11: 4670
12: 26144
13: 1358
14: 1358
15: 1358
16: 2716

With the hand coded hashCode() method:

1: 79093
2: 180316
3: 23444
4: 107020
5: 2213
6: 6821
7: 296
8: 960
10: 12

So the Eclipse generated method only gives 62 hash codes that occurred only one time.

The hand coded version gave 79093 hash codes that only occurred once, and 180316 hash codes that only occurred twice.

Quite a difference in spread.


Edit 2:

Also tried the Objects.hash(...), and this gave identical "re-occurrency" counts compared to the Eclipse generated hashCode() method.

@Override
public int hashCode() {
    return Objects.hash(a, b, c);
}

Also, this actually slowed down the execution even more: 1.38 seconds


Edit 3:

And here is an explanation on where that "magic number" in the better hash code method above comes from:

Magic number in boost::hash_combine


Edit 4: Using http://projectlombok.org to generate hashCode() and equals() methods

Lombok gave the best results yet:

1: 33958
2: 146124
3: 8118
4: 162360

Execution time: 0.187000 s
Community
  • 1
  • 1
User0
  • 564
  • 3
  • 10
  • I don't see how what you have done can be called a right "benchmark test". And if you want to test hashCode you should call this method – Marco Acierno Aug 02 '14 at 13:16
  • It is not the execution time of the hashCode() method that is the problem. Read the code please. – User0 Aug 02 '14 at 13:20
  • "Why is the hash code method generated by Eclipse so inefficient" – Marco Acierno Aug 02 '14 at 13:21
  • Sorry, changed the missleading title. – User0 Aug 02 '14 at 13:24
  • Here's something you might find interesting: keep a counter of the number of times each hashcode value appears for the two approaches. I suspect that you'll find a lot of duplication in the first. – kdgregory Aug 02 '14 at 13:27
  • @kdgregory: thanks, added results for that, and yes, a lot more duplication for the Eclipse generated method – User0 Aug 02 '14 at 14:16

1 Answers1

3

Eclipse hashCode() follows the guideline suggested in Effective Java. The author said the method is fairly good, but it's definitely not the best one.

If hashCode performance is not satisfactory, you are free to find alternatives.

One more thing that I'd like to mention is that for almost every hashCode function, you might be able to find some data set that prevent the function from evenly distributing the hash values, thus making the HashSet in your code work like a long list.

You can see the other discussions here: Is the hashCode function generated by Eclipse any good?

You may also want to read this

Community
  • 1
  • 1
Byungjoon Lee
  • 913
  • 6
  • 18