3

I had a question when I learned the HashMap source code in java8。

Source code is so complicated, how much efficiency?

So I wrote a code about the hash conflict。

public class Test {             
    final int i;            

    public Test(int i) {            
        this.i = i;     
    }           

    public static void main(String[] args) {            
        java.util.HashMap<Test, Test> set = new java.util.HashMap<Test, Test>();        
        long time;      
        Test last;      
        Random random = new Random(0);      
        int i = 0;      
        for (int max = 1; max < 200000; max <<= 1) {        
            long c1 = 0, c2 = 0;    
            int t = 0;  
            for (; i < max; i++, t++) { 
                last = new Test(random.nextInt());
                time = System.nanoTime();
                set.put(last, last);
                c1 += (System.nanoTime() - time);
                last = new Test(random.nextInt());
                time = System.nanoTime();
                set.get(last);
                c2 += (System.nanoTime() - time);
            }   
            System.out.format("%d\t%d\t%d\n", max, (c1 / t), (c2 / t)); 
        }       
    }           

    public int hashCode() {         
        return 0;       
    }           

    public boolean equals(Object obj) {         
        if (obj == null)        
            return false;   
        if (!(obj instanceof Test))     
            return false;   
        Test t = (Test) obj;        
        return t.i == this.i;       
    }           
}

I show the results in Excel。 enter image description here

I am using java6u45 java7u80 java8u131。

I do not understand why the performance of java8 will be so bad

I'm trying to write my own HashMap.

I would like to learn HashMap in java8 which is better, but I did not find it.

zzjbook
  • 77
  • 2
  • 8
  • 6
    Because returning a single static value is a **terrible** `hashCode()` implementation. Do not do that. Do `return Integer.hashCode(i);` and you should see *better* performance. – Elliott Frisch Jul 15 '17 at 05:40
  • @ElliottFrisch I am testing the hash conflict. – zzjbook Jul 15 '17 at 05:46
  • 2
    The performance of `HashMap` depends on how well the `hashCode()` method of the keys that you put into it is implemented. Don't expect it to perform well if you have a bad `hashCode()` method. It's like you have a shiny, new engine, then you throw sand in the engine and complain that it is performing badly... – Jesper Jul 15 '17 at 05:55
  • @Jesper I just compare java6 java7 java8 HashMap, rather than how to make HashMap more efficient.I'm trying to write my own HashMap.I would like to learn HashMap in java8 which is better, but I did not find it. – zzjbook Jul 15 '17 at 06:12
  • 2
    And your benchmark doesn't look right - use jmh. See also https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – assylias Jul 15 '17 at 06:28
  • @Elliott Frisch: for a second, I hoped that such a method does not really exist… Well, you *can* write `return Integer.hashCode(i);`… or you write just `return i;` … – Holger Jul 17 '17 at 07:32
  • @Holger That's what `Integer.hashCode` does internally, but using the wrapper class' implementation is a useful pattern for when your key isn't an `int` (because writing a good `hashCode` is an art unto itself). – Elliott Frisch Jul 19 '17 at 16:42
  • @Elliott Frisch: that’s also what it does openly, as its documentation says that `Integer.hashCode(int)` returns a value “compatible with `Integer.hashCode()`”, whose documentation in turn says “Returns: a hash code value for this object, **equal to the primitive int value represented by this Integer object**”. And I think, every developer dealing with hash codes should understand enough about the matter to recognize why using just the int value itself as its hash code is already reasonable and sufficient. – Holger Jul 19 '17 at 17:38

1 Answers1

6

Your test scenario is non-optimal for Java 8 HashMap. HashMap in Java 8 optimizes collisions by using binary trees for any hash chains longer than a given threshold. However, this only works if the key type is comparable. If it isn't then the overhead of testing to see if the optimization is possible actually makes Java 8 HashMap slower. (The slow-down is more than I expected ... but that's another topic.)

Change your Test class to implement Comparable<Test> ... and you should see that Java 8 performs better than the others when the proportion of hash collisions is large enough.


Note that the tree optimization should be considered as a defensive measure for the case where the hash function doesn't perform. The optimization turns O(N) worst-case performance to O(logN) worst-case.

If you want your HashMap instances to have O(1) lookup, you should make sure that you use a good hash function for the key type. If the probability of collision is minimized, the optimization is moot.


Source code is so complicated, how much efficiency?

It is explained in the comments in the source code. And probably other places that Google can find for you :-)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • The binary tree in the HashMap is also compare hash value. Why do not using the 2-choice hashing? – zzjbook Jul 15 '17 at 06:07
  • 1
    The binary tree in HashMap (Java 8) does not use the hash value. It compares the actual keys. Please read the code again. – Stephen C Jul 15 '17 at 11:34
  • @Stephen C: actually, the OP is right, the binary tree *does* use the hashcodes, that’s to solve bucket collisions. Only in the case of true hash collisions, it tries to use comparable keys, then, it uses identity hash codes. And I also asked myself, why it doesn’t use 2-choice hashing or similar things, however, in the OP’s case that wouldn’t help anyway. – Holger Jul 17 '17 at 07:37