I usually let Eclipse generate the hashCode() method for me, but now I found signs that the generated hash code may not be very good.
Using the hash code returned by the Eclipse-generated hashCode() method in a hash set results in about 6 times slower lookups than when using a hand coded hashCode() method.
Here is my test:
- A class with three int fields.
- The hashCode() and equals() methods has been generated by Eclipse.
- Fill a HashSet with 1.000.000 instances of the class.
- Look up each instance in the HashSet
- Repeat the look up 10 times.
The code:
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class MyPojo {
private final int a;
private final int b;
private final int c;
public MyPojo(int a, int b, int c) {
super();
this.a = a;
this.b = b;
this.c = c;
}
public static void main(String[] args) {
List<MyPojo> listOfPojos = new ArrayList<MyPojo>();
Set<MyPojo> setOfPojos = new HashSet<MyPojo>();
for (int countA = 0; countA < 100; countA++) {
for (int countB = 0; countB < 100; countB++) {
for (int countC = 0; countC < 100; countC++) {
MyPojo myPojo = new MyPojo(countA, countB, countC);
listOfPojos.add(myPojo);
setOfPojos.add(myPojo);
}
}
}
long startTime = System.currentTimeMillis();
for (int count = 0; count < 10; count++) {
for (MyPojo myPojo : listOfPojos) {
if (!setOfPojos.contains(myPojo)) {
throw new RuntimeException();
}
}
}
long endTime = System.currentTimeMillis();
System.out.format("Execution time: %3f s", (endTime - startTime) / 1000.0);
}
// Generated by Eclipse
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + a;
result = prime * result + b;
result = prime * result + c;
return result;
}
// Generated by Eclipse
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
MyPojo other = (MyPojo) obj;
if (a != other.a)
return false;
if (b != other.b)
return false;
if (c != other.c)
return false;
return true;
}
}
On my machine this results in an execution time of about 1.23 seconds.
Now replace the hashCode() method with this one that I found elsewhere:
@Override
public int hashCode() {
final int magic = 0x9e3779b9;
int seed = 0;
seed ^= this.a + magic + (seed << 6) + (seed >> 2);
seed ^= this.b + magic + (seed << 6) + (seed >> 2);
seed ^= this.c + magic + (seed << 6) + (seed >> 2);
return seed;
}
Now the execution time is only 0.2 seconds, about 6 times faster!
Why?
Edit:
As suggested, counting the number of "re-occurrencies" of hash values gives the following:
With the Eclipse generated hashCode() method:
1: 62
2: 62
3: 1406
4: 440
5: 62
6: 1406
7: 62
8: 440
9: 52094
10: 4670
11: 4670
12: 26144
13: 1358
14: 1358
15: 1358
16: 2716
With the hand coded hashCode() method:
1: 79093
2: 180316
3: 23444
4: 107020
5: 2213
6: 6821
7: 296
8: 960
10: 12
So the Eclipse generated method only gives 62 hash codes that occurred only one time.
The hand coded version gave 79093 hash codes that only occurred once, and 180316 hash codes that only occurred twice.
Quite a difference in spread.
Edit 2:
Also tried the Objects.hash(...), and this gave identical "re-occurrency" counts compared to the Eclipse generated hashCode() method.
@Override
public int hashCode() {
return Objects.hash(a, b, c);
}
Also, this actually slowed down the execution even more: 1.38 seconds
Edit 3:
And here is an explanation on where that "magic number" in the better hash code method above comes from:
Magic number in boost::hash_combine
Edit 4: Using http://projectlombok.org to generate hashCode() and equals() methods
Lombok gave the best results yet:
1: 33958
2: 146124
3: 8118
4: 162360
Execution time: 0.187000 s