2

I have been getting confused about some yourkit snapshots which seems to suggest that in a particular stack hashmap.put() is proving to be costly.

Let's say the key for this map is a very complex object which has not overridden equals() or hashCode()

Is it really possible for HashMap.hash() or Object.hashCode() to be costly in certain situations ?

Erik Schierboom
  • 16,301
  • 10
  • 64
  • 81
Vikrant
  • 21
  • 2
  • The default hashcode of most objects are their address in memory, since two things cannot share that space this value is unique. Plus it is a native function so it should be super fast to do. – Logan Murphy Aug 02 '13 at 15:59
  • If the key is not overridden then the default `hashCode` will be used. This just returns the memory location of the item which should be rather cheap. The issue will be unqiueness, given a large pool of objects clashes are [fairly likely](http://stackoverflow.com/a/1381288/2071828) and this has a massive, detrimental, impact on the performance of a `HashMap`. – Boris the Spider Aug 02 '13 at 15:59
  • @BoristheSpider Would it not be unlikely that two objects would have the same memory address making it unlikely to have clashes? I'm assuming that Java's hashing algorithm is pretty good. – Lee Meador Aug 02 '13 at 16:06
  • @LeeMeador a hashcode is in the `int` range. There are more addresses than that... – Boris the Spider Aug 02 '13 at 16:07
  • @BoristheSpider There are 4 Billion values for an int (4,000 million in some places). Most applications take less than 4 Gb of memory. Admittedly not necessarily contiguous but still ... unlikely to clash, I think. – Lee Meador Aug 02 '13 at 16:10
  • Is there something being counted beyond the time to do the 'put'? A line of code like `map.put(longRunningCalculationOfHashKey(), "bob")` would take a long time to run but it wouldn't be the `put` that caused the issue. – Lee Meador Aug 02 '13 at 16:12
  • @MarkoTopolnik Duh. I got it right in the memory but I'm claiming a typo in the count. Fixed now. – Lee Meador Aug 02 '13 at 16:13
  • @Lea Meador. Note that maximum value of int type in java is over 2 billions or 2 raised to power of 31. – Volodymyr Levytskyi Aug 03 '13 at 10:41

4 Answers4

1

It certainly is possible to create an object which is costly to put into a HashMap:

class Awful {
    @Override
    public int hashCode() {
        try {
            Thread.sleep(10000000);
        } catch (InterruptedException e) {
           Thread.currentThread().interrupt();
        }
        return 1;
    }
}

Obviously this is a contrived example but I've seen implementations that call methods on Hibernate backed, lazy loaded objects. Putting said object into a list resulted in numerous, quite expensive, database lookups.

Without overriding hashCode() the value is derived from the address of the object, which the JVM already has at the point the method is called, so it's (as good as) instant.

StuPointerException
  • 7,117
  • 5
  • 29
  • 54
  • Yes. It didn't make sense. Looks like it is a yourkit bug/issue. As soon as I swap the Map with a List structure, the cost distributes to other stacks. Definitely yourkit playing bad. – Vikrant Aug 05 '13 at 07:46
1

It is theoretically possible, but highly unlikely.

If you are really using Object.hashcode() then that delegates to System.identityHashCode(Object) which is relatively cheap. Furthermore, identity hashcodes should not give you collision hotspots ... unless you get really unlucky.

If you are seeing a performance hotspot, in HashMap.hash() and Object.hashCode(), then perhaps the reason is that you are just doing lots of HashMap lookups.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
1

You said your profiler indicated put as the hot spot, not the hashcode calculation. put is about much more than just hashcode determination. Particularly, if the map is very large, there will be a lot of housekeeping work done. And if your code does little else but invoke put a huge number of times, then naturally this will show up in the profiler as the hot spot. There may be nothing wrong with its performance, though.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
0

I believe yourkit can't always be trusted. Under heavy JIT optimizations profilers they get confused and starts showing up costs at awkward positions.

Vikrant
  • 21
  • 2