36

Normally, Java optimizes the virtual calls based on the number of implementations encountered on a given call side. This can be easily seen in the results of my benchmark, when you look at myCode, which is a trivial method returning a stored int. There's a trivial

static abstract class Base {
    abstract int myCode();
}

with a couple of identical implementation like

static class A extends Base {
    @Override int myCode() {
        return n;
    }
    @Override public int hashCode() {
        return n;
    }
    private final int n = nextInt();
}

With increasing number of implementations, the timing of the method call grows from 0.4 ns through 1.2 ns for two implementations to 11.6 ns and then grows slowly. When the JVM has seen multiple implementation, i.e., with preload=true the timings differ slightly (because of an instanceof test needed).

So far it's all clear, however, the hashCode behaves rather differently. Especially, it's 8-10 times slower in three cases. Any idea why?

UPDATE

I was curious if the poor hashCode could be helped by dispatching manually, and it could a lot.

timing

A couple of branches did the job perfectly:

if (o instanceof A) {
    result += ((A) o).hashCode();
} else if (o instanceof B) {
    result += ((B) o).hashCode();
} else if (o instanceof C) {
    result += ((C) o).hashCode();
} else if (o instanceof D) {
    result += ((D) o).hashCode();
} else { // Actually impossible, but let's play it safe.
    result += o.hashCode();
}

Note that the compiler avoids such optimizations for more than two implementation as most method calls are much more expensive than a simple field load and the gain would be small compared to the code bloat.

The original question "Why doesn't JIT optimize the hashCode like other methods" remains and hashCode2 proofs that it indeed could.

UPDATE 2

It looks like bestsss is right, at least with this note

calling hashCode() of any class extending Base is the same as calling Object.hashCode() and this is how it compiles in the bytecode, if you add an explicit hashCode in Base that would limit the potential call targets invoking Base.hashCode().

I'm not completely sure about what's going on, but declaring Base.hashCode() makes a hashCode competitive again.

results2

UPDATE 3

OK, providing a concrete implementation of Base#hashCode helps, however, the JIT must know that it never gets called, as all subclasses defined their own (unless another subclass gets loaded, which can lead to a deoptimization, but this is nothing new for the JIT).

So it looks like a missed optimization chance #1.

Providing an abstract implementation of Base#hashCode works the same. This makes sense, as it provides ensures that no further lookup is needed as each subclass must provide its own (they can't simply inherit from their grandparent).

Still for more than two implementations, myCode is so much faster, that the compiler must be doing something subobtimal. Maybe a missed optimization chance #2?

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • I would stress the point of multiple implementations of Base, and the extension of such; it's buried in the question (but not in the title at all) and feels mostly lost. – user2864740 Jun 14 '14 at 18:35
  • Which version of caliper are you running? I'd like to test this myself. – Sotirios Delimanolis Jun 14 '14 at 18:55
  • @SotiriosDelimanolis I'm using the one from [git](https://code.google.com/p/caliper/source/checkout), but I'm sure it can be easily adapted to another one (it's just the setup and time* methods). – maaartinus Jun 14 '14 at 18:58
  • @user2864740 I'm not sure how to do it, but feel free to edit. I just wouldn't change the title as it simply can't contain all information. – maaartinus Jun 14 '14 at 19:00
  • Hmmm... The fact that more implementations seems to incur a penalty is interesting to me. I learned that Java used a virtual method table for figuring out what implementation to call, so I *thought* that neither deep inheritance hierarchies nor multiple implementations should matter for performance, since all methods in Java are virtual and because of the table the runtime just looks up the proper implementation. Evidently there's a gap in my understanding somewhere, probably related to not knowing *exactly* how those tables work... – awksp Jun 15 '14 at 00:50
  • 7
    @user3580294 All the short bars you can see above come from actually *avoiding the virtual method table lookup*. The shortest one (0.4 ns, i.e., one cycle per method call) is only possible as the JVM *knows* there's a single implementation and directly inlines the field read. The second shortest one (0.6 ns) contains additionally a correctly predicted branch testing that `o` is actually an `A` instance. The third shortest one (1.2 ns) comes from switching between inlined `A.myCode()` and `B.myCode()`. The 10 times slowdown comes when the table really gets involved. – maaartinus Jun 15 '14 at 01:12
  • Oh wow, didn't realize that there was that much of a performance impact. Also seems that I didn't fully understand what your code was doing, but now after your comment and a second look, I think I get what's going on. I didn't understand exactly how inlining worked, but I *think* I understand now. Sorry for being slow to understand, and thanks for taking the time to explain. – awksp Jun 15 '14 at 01:21
  • So more specifically you're asking why `hashCode()` seems to be inlined less aggressively than an otherwise identical method? Also, just reread your question, and you explain why the branch is needed. Missed that earlier. – awksp Jun 15 '14 at 01:22
  • 2
    @user3580294 The performance impact by overriding can be significant. Some more info at http://www.javaspecialists.eu/archive/Issue158.html @maaartinus Concerning the actual question: The `hashCode` method undergoes a special treatment by the JIT. You might play around with `-XX:DisableIntrinsic=_hashCode`, or have a look at http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/a57a165b8296/src/share/vm/opto/library_call.cpp#l3924 (and #l3977, #l4000, #l4103 ... search for "hashCode"). I can't point my finger at *THE reason*, but guess it's hidden somewhere in there.... – Marco13 Jun 15 '14 at 15:41
  • Just curious, how did you come across this behavior? I can't imagine doing anything that would lead to finding out that `hashCode()` appeared to be treated differently than other methods – awksp Jun 15 '14 at 16:51
  • @user3580294 Actually, I wanted to write a benchmark I promised in this [question](http://stackoverflow.com/q/4885095/581205), but stumbled upon this strangeness. It's just curiosity, no practical use for my job (apart from learning more being always good). – maaartinus Jun 15 '14 at 17:15
  • That's quite some time before you got around to benchmarking, it seems :P Interesting question there, too... – awksp Jun 15 '14 at 18:13
  • *Note that the compiler avoids such optimizations for more than two implementation as most method calls are much more expensive than a simple field load and the gain would be small compared to the code bloat.* This is not true actually, the compiler can use inline caches (I did link one post on them in the comments below my answer) – bestsss Jun 21 '14 at 22:26

5 Answers5

4

hashCode is defined in java.lang.Object, so defining it in your own class doesn't do much at all. (still it's a defined method but it makes no difference)

JIT has several ways to optimize call sites (in this case hashCode()):

  • no overrides - static call (no virtual at all) - best case scenario with full optimizations
  • 2 sites - ByteBuffer for instance: exact type check and then static dispatch. The type check is very simple but depending on the usage it may or may not be predicted by the hardware.
  • inline caches - when few different class instances have been used in the caller body, it's possible to keep them inlined too - that's it some methods might be inlined, some may be called via virtual tables. Inline budget is not very high. This is exactly the case in the question - a different method not named hashCode() would feature the inline caches as there are only four implementations, instead of the v-table
  • Adding more classes going through that caller body results in real virtual call as the compiler gives up.

The virtual calls are not inlined and require an indirection through the table of virtual methods and virtually ensured cache miss. The lack of inlining actually requires full function stubs with parameters passed through the stack. Overall when the real performance killer is the inability to inline and apply optimizations.

Please note: calling hashCode() of any class extending Base is the same as calling Object.hashCode() and this is how it compiles in the bytecode, if you add an explicit hashCode in Base that would limit the potential call targets invoking Base.hashCode().

Way too many classes (in JDK itself) have hashCode() overridden so in cases on not inlined HashMap alike structures the invocation is performed via vtable - i.e. slow.

As extra bonus: While loading new classes the JIT has to deoptimize existing call sites.


I may try to look up some sources, if anyone is interested in further reading

bestsss
  • 11,796
  • 3
  • 53
  • 63
  • How does this answer the question? Evidently defining `hashCode()` in your own class *does* make a difference, and the JIT evidently doesn't inline `hashCode()` the way it inlines other methods. Notice that for each test the number of implementations for both the "normal" `hashCode()` and the custom `myCode()` is exactly the same. So why does it seem `hashCode()` and `myCode()` are inlined differently? – awksp Jun 21 '14 at 17:38
  • It does answer the question and that's the answer: to put it simply there are only few methods `myCode()` and tons of `hashCode()`. The latter has huge class hierarchy which prevents optimizations - as there are tons in the JDK itself that define hashCode(). One more time there is not special handling of `hashCode()`: `toString()` and `equals(Object)` have exactly the same fate. – bestsss Jun 21 '14 at 19:30
  • 1
    `hashCode()` may be implemented in a lot of classes, but a) a lot of those classes aren't loaded, and b) as pointed out in @laune's answer, using a forwarding implementation removes all differences between `hashCode()` and `myCode()`, which clearly shows that optimizations *are* possible despite the number of `hashCode()`-implementing classes available. In addition, if the number of `hashCode()`-implementing classes were a factor, I would not expect the timing for `hashCode()` to a) differ significantly by preloading implementation classes and b) be the same for a single implementing class. – awksp Jun 21 '14 at 21:04
  • @user3580294, you dont need that many classes to be loaded defining `hashCode` to prevent inline caches. Here is a post from Cliff Click (the hotspot architect) on [inline caches](http://www.azulsystems.com/blog/cliff/2010-04-08-inline-caches-and-call-site-optimization) – bestsss Jun 21 '14 at 21:10
  • So you're saying that loading four additional classes is enough to tip the scales one way or the other? I'm not very sure about that, given the number of classes that need to be loaded to run the JVM in the first place, as well as the fact that OP's benchmark showed three implementations of `hashCode()` was enough to negate the inlining advantage. And you haven't yet explained why the forwarding implementation of `hashCode()` doesn't suffer from the same issues as OP's original implementation. – awksp Jun 21 '14 at 21:12
  • @user3580294, I do not say that, yet I am not seeing the laune's full code either. So I can't comment. All I say is: having more than the several clasess with the same method is likely to result in v-calls. – bestsss Jun 21 '14 at 21:15
  • And according to OP's benchmark, that number is two. And your answer still doesn't explain why having one `hashCode()` implementation with no preloading is equally fast as having one `myCode()` implementation with no preloading, despite the large number of `hashCode()`-implementing classes loaded behind the scenes (if there are any). And do you really need to see full code when the changes to OP's code have been described? Seems to me that the important part is the number of `hashCode()` implementations that can be called *by a particular object*, which precludes your argument. – awksp Jun 21 '14 at 21:19
  • "And your answer still doesn't explain why having one hashCode() implementation with no preloading is equally fast as having one myCode()" b/c the entire function that calls hashCode() can be inlined probably. I do not know what you call 'preloading' as there is no such term used neither with JIT, nor with classloading itself. To answer such question one needs *the exact* benchmark and then looking at the generated assembly. – bestsss Jun 21 '14 at 22:22
  • 1
    Then the function that calls `hashCode()` can be inlined *despite the presence of other classes that implement `hashCode()`*, right? So I'm not seeing how your answer applies here. I'm using "preloading" as used by OP -- that is, using code to make the classloader load all implementing classes despite only some of them being used. The *exact* benchmark is provided by OP, changes are described by @laune, and you should be able to replicate this question yourself. If you want assembly, then leave a comment on the original post. – awksp Jun 21 '14 at 22:39
  • 2
    @user3580294, I added one more paragraph clarifying `hashCode()` calls via Base.hashCode() -- they are effectively `Object.hashCode()` as `Base` doesn't have `hashCode()`. I thought that was perfectly clear. So between JDK and calipher framework there would be really a lot of classes defining `hashCode()` so the call is a Object.v-call. If you have only one class with hashCode and the compiler can treat as NOT java.lang.Object it'd be obviously a static call. That's the last comment here. – bestsss Jun 21 '14 at 22:54
  • All right, I was wrong. Your last comment clarified things for me that I didn't realize, although as you said it should have been totally clear. I apologize for being so confrontational. Right now this seems to me to be plausible, but to be honest I'm still a little dissatisfied. Can't really complain though. – awksp Jun 22 '14 at 00:16
3

This is a known performance issue: https://bugs.openjdk.java.net/browse/JDK-8014447
It has been fixed in JDK 8.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • I'm not sure... in this bug the performance of case "i_i" is fine and I was calling hashCode on `Base`, not on `Object`. But the cause may still be the same. – maaartinus Sep 13 '14 at 15:24
  • @maaartinus If you define `hashCode` in `Base` there will be no performance regression. If you don't define `hashCode` in `Base` then `Object.hashCode` will be called, even if you write `((Base)b).hashCode()` – apangin Sep 14 '14 at 18:18
1

I can confirm the findings. See these results (recompilations omitted):

$ /extra/JDK8u5/jdk1.8.0_05/bin/java Main
overCode :    14.135000000s
hashCode :    14.097000000s

$ /extra/JDK7u21/jdk1.7.0_21/bin/java Main
overCode :    14.282000000s
hashCode :    54.210000000s

$ /extra/JDK6u23/jdk1.6.0_23/bin/java Main
overCode :    14.415000000s
hashCode :   104.746000000s

The results are obtained by calling methods of class SubA extends Base repeatedly. Method overCode() is identical to hashCode(), both of which just return an int field.

Now, the interesting part: If the following method is added to class Base

@Override
public int hashCode(){
    return super.hashCode();
}

execution times for hashCode aren't different from those for overCode any more.

Base.java:

public class Base {
private int code;
public Base( int x ){
    code = x;
}
public int overCode(){
return code;
}
}

SubA.java:

public class SubA extends Base {
private int code;
public SubA( int x ){
super( 2*x );
    code = x;
}

@Override
public int overCode(){
return code;
}

@Override
public int hashCode(){
    return super.hashCode();
}
}
laune
  • 31,114
  • 3
  • 29
  • 42
0

I was looking at your invariants for your test. It has scenario.vmSpec.options.hashCode set to 0. According to this slideshow (slide 37) that means Object.hashCode will use a random number generator. That might be why the JIT compiler is less interested in optimising calls to hashCode as it considers it likely it may have to resort to an expensive method call, which would offset any performance gains from avoiding a vtable lookup.

This may also be why setting Base to have its own hash code method improves performance as it prevents the possibility of falling through to Object.hashCode.

http://www.slideshare.net/DmitriyDumanskiy/jvm-performance-options-how-it-works

Dunes
  • 37,291
  • 7
  • 81
  • 97
-2

The semantics of hashCode() are more complex than regular methods, so the JVM and the JIT compiler must do more work when you call hashCode() than when you call a regular virtual method.

One specificity has an negative impact on performance : calling hashCode() on a null object is valid and returns zero. This requires one more branching than on a regular call which in itself can explain the performance difference you have constated.

Note that is is true it seems only from Java 7 due to the introduction of Object.hashCode(target) which has this semantic. It would be interesting to know on which version you tested this issue and if you would have the same on Java6 for instance.

Another specificity has a positive impact on performance : if you do not provide your own hasCode() implementation, the JIT compiler will use an inline hashcode computation code which is faster than a regular compiled Object.hashCode call.

E.

Eric Nicolas
  • 1,507
  • 3
  • 17
  • 24
  • 4
    You're wrong. 1. `null.hashCode()` throws just like any other method. 2. `Objects` is an utility class not to be confused with the `Object`. 3. not providing `hashCode` means inheriting from `Object.hashCode` which is marked `native` but internally (probably) delegates to `System.identityHashCode`. 4. it's not the JIT, it's just normal inheritance. 5. in the benchmark `myCode` is similarly inherited from `Base`; `Base.myCode` is abstract, but this changes nothing – maaartinus Jun 20 '14 at 08:16
  • I do believe my answer is correct, after reading the JVM code on the link given by Marco13 above. Look at method inline_native_hashcode. – Eric Nicolas Jun 20 '14 at 09:04
  • I would really really love to know where you pulled the "calling `hashCode()` on a `null` object is valid" part from, as any reasonable programmer would say that that would lead to a NPE. See [here](http://ideone.com/K3yCtA). It's also in the JLS, section 15.12.4.4, Locate Method to Invoke: "If the target reference is null, a NullPointerException is thrown at this point." – awksp Jun 21 '14 at 09:32
  • 1
    In addition, the JIT compiler just as capable of inlining a "regular compiled" method as it is a native method such as `Object#hashCode()`. If anything, a "regular compiled" `hashCode()` that just returned `0` or some other constant is almost guaranteed to be faster than the default `hashCode()` implementation, especially after inlining. maaartinus is correct on all counts. If you feel otherwise, you're going to have to be a bit more specific as to what you see. – awksp Jun 21 '14 at 09:34
  • 1
    @EricNicolas I also think that the answer is somewhere hidden in the depth of the "inline_native_hashcode" method at http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/a57a165b8296/src/share/vm/opto/library_call.cpp#l4000 - it talks about "slow calls", intrinsics for native hashCode and identityHashCode and vtables - but I don't understand what it is actually doing there, so I did not try to formulate it as an answer... – Marco13 Jun 21 '14 at 20:36
  • After more investigation, it appears that the JVM source code refers to calls to System.identityHashcode(Object x). This method returns zero for null objects (as specified in the Oracle documentation). It looks like the JIT compiler generates the same code path for this and all o.hashCode() calls even though other generic generated code will throw NullException when called on a null object. – Eric Nicolas Jun 23 '14 at 15:52
  • @EricNicolas This is impossible. If you get `o.hashCode()` with `o==null` not throwing, then you found a bug. Sure, for non-null objects having no own `hashCode`, the same path can be used. – maaartinus Jun 25 '14 at 15:53