23

I have an interface PackedObject:

public interface PackedObject {
    int get();
    int sum();
    void setIndex(int index);
    default int defaultSum() {
        return get();
    }
}

An abstract class AbstractPackedObject:

public abstract class AbstractPackedObject implements PackedObject {
    protected int index = 0;
    protected int[] buffer;

    public void setIndex(int index) {
        this.index = index;
    }

    public void setBuffer(int[] buffer) {
        this.buffer = buffer;
    }

    @Override
    public int sum(){
        return get();
    }
}

And a concrete implemention WrappedPackedObject:

public class WrappedPackedObject extends AbstractPackedObject implements PackedObject {

    public WrappedPackedObject(int[] buffer) {
        this.buffer = buffer;
    }

    @Override
    public int get() {
        return buffer[index];
    }
}

I benchmarked defaultSum and sum methods (snippet of the JMH benchmark):

    for (int i = 0; i < NB; i++) {
        packedObject.setIndex(i);
        value += packedObject.defaultSum();
    }

    for (int i = 0; i < NB; i++) {
        packedObject.setIndex(i);
        value += packedObject.sum();
    }

I try to figure why the sum benchmarker is faster than the defaultSum benchmark by a factor of 1.7.

I have start to dig into the JIT arcane. Call site targets only one method, so I'm expecting inlining to be done. The output of print inlining is the following:

@ 25   com.github.nithril.PackedObject::defaultSum (7 bytes)   inline (hot)
 \-> TypeProfile (479222/479222 counts) = com/github/nithril/WrappedPackedObject
  @ 1   com.github.nithril.WrappedPackedObject::get (14 bytes)   inline (hot)
    @ 10   java.nio.DirectByteBuffer::getInt (15 bytes)   inline (hot)


@ 25   com.github.nithril.AbstractPackedObject::sum (5 bytes)   inline (hot)
  @ 1   com.github.nithril.WrappedPackedObject::get (14 bytes)   inline (hot)
    @ 10   java.nio.DirectByteBuffer::getInt (15 bytes)   inline (hot)

I don't yet understand why this line appears TypeProfile (479222/479222 counts) = com/github/nithril/WrappedPackedObject

I create a dedicated project with the above code. The benchmark is done using JMH.

Thanks for your help.

EDIT 2015/05/20:

I simplify the java code.

The inner loop of the benchSum is quite straightforward:

0x00007f1bb11afb84: add    0x10(%r10,%r8,4),%eax  ;*iadd
                                              ; - com.github.nithril.PackedObjectBench::benchSum@29 (line 50)
0x00007f1bb11afb89: mov    %r8d,0xc(%r12,%r11,8)  ;*putfield index
                                              ; - com.github.nithril.AbstractPackedObject::setIndex@2 (line 13)
                                              ; - com.github.nithril.PackedObjectBench::benchSum@17 (line 49)
0x00007f1bb11afb8e: inc    %r8d               ;*iinc
                                              ; - com.github.nithril.PackedObjectBench::benchSum@31 (line 48)
0x00007f1bb11afb91: cmp    $0x2710,%r8d
0x00007f1bb11afb98: jl     0x00007f1bb11afb84

The inner loop of the benchDefaultSum is more complicated with read/write of the index and inside the inner loop a comparison of the array bound. I do not yet completely understand the purpose of this comparison...

0x00007fcfdcf82cb8: mov    %edx,0xc(%r12,%r11,8)  ;*putfield index
                                              ; - com.github.nithril.AbstractPackedObject::setIndex@2 (line 13)
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@17 (line 32)
0x00007fcfdcf82cbd: mov    0xc(%r10),%r8d     ;*getfield index
                                              ; - com.github.nithril.WrappedPackedObject::get@5 (line 17)
                                              ; - com.github.nithril.PackedObject::defaultSum@1 (line 15)
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@24 (line 33)
0x00007fcfdcf82cc1: cmp    %r9d,%r8d
0x00007fcfdcf82cc4: jae    0x00007fcfdcf82d1f  ;*iaload
                                              ; - com.github.nithril.WrappedPackedObject::get@8 (line 17)
                                              ; - com.github.nithril.PackedObject::defaultSum@1 (line 15)
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@24 (line 33)
0x00007fcfdcf82cc6: add    0x10(%rcx,%r8,4),%eax  ;*iadd
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@29 (line 33)
0x00007fcfdcf82ccb: inc    %edx               ;*iinc
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@31 (line 31)
0x00007fcfdcf82ccd: cmp    $0x2710,%edx
0x00007fcfdcf82cd3: jl     0x00007fcfdcf82cb8  ;*aload_2
[...]
0x00007fcfdcf82ce6: mov    $0xffffffe4,%esi
0x00007fcfdcf82ceb: mov    %r10,0x8(%rsp)
0x00007fcfdcf82cf0: mov    %ebx,0x4(%rsp)
0x00007fcfdcf82cf4: mov    %r8d,0x10(%rsp)
0x00007fcfdcf82cf9: xchg   %ax,%ax
0x00007fcfdcf82cfb: callq  0x00007fcfdcdea1a0  ; OopMap{rbp=NarrowOop [8]=Oop off=416}
                                              ;*iaload
                                              ; - com.github.nithril.WrappedPackedObject::get@8 (line 17)
                                              ; - com.github.nithril.PackedObject::defaultSum@1 (line 15)
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@24 (line 33)
                                              ;   {runtime_call}
0x00007fcfdcf82d00: callq  0x00007fcff1c94320  ;*iaload
                                              ; - com.github.nithril.WrappedPackedObject::get@8 (line 17)
                                              ; - com.github.nithril.PackedObject::defaultSum@1 (line 15)
                                              ; - com.github.nithril.PackedObjectBench::benchDefaultSum@24 (line 33)
                                              ;   {runtime_call}
[...]
0x00007fcfdcf82d1f: mov    %eax,(%rsp)
0x00007fcfdcf82d22: mov    %edx,%ebx
0x00007fcfdcf82d24: jmp    0x00007fcfdcf82ce6
Nicolas Labrot
  • 4,017
  • 25
  • 40
  • 2
    ...Why do you have an abstract class _and_ an interface? – Nic May 18 '15 at 20:21
  • 3
    For the benchmark purpose. I have started the benchmark using an interface without default method. Then I moved the `sum` from the concrete class to a default method. Afterwards I spot the difference between the concrete and the interface. To finish the test, I create an abstract class with the `sum` method and the interface with the `defaultSum` – Nicolas Labrot May 18 '15 at 20:28
  • Oh, I see. I thought this was an application, and couldn't think of a place where you'd need (or even really _want_) two layers of abstraction like that. – Nic May 18 '15 at 20:29
  • 2
    What do you want here? What is your question? Why does this surprise you? – Louis Wasserman May 18 '15 at 21:00
  • The question is `I try to figure why when I benchmark defaultSum and sum methods, the last one is faster than the first one by a factor of 1.7`. Why does this surprise me? Because I'm expecting no difference. – Nicolas Labrot May 18 '15 at 21:05
  • 3
    what if you switch the order of the two loops in the benchmark? – ZhongYu May 18 '15 at 21:11
  • 1
    Yeah, try switching their order. You might be running into the situation where the constituent classes are being loaded during the first run, and are loaded and optimized during the second. – Shotgun Ninja May 18 '15 at 21:14
  • There is no change if I run `benchDefaultSum` then `benchSum`or `benchSum` then `benchDefaultSum` – Nicolas Labrot May 18 '15 at 21:16
  • 1
    I tried to run your benchmark and it seems like compilation is still ongoing during the iteration phase. Am I missing something? – biziclop May 18 '15 at 21:25
  • I have tested using `-XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*github*.*'`. It is a bit restricting especialy against the ArrayList class but even with a warmup of 30, the result does not change – Nicolas Labrot May 18 '15 at 21:41
  • 1
    I suspect the culprit is the `get()` call, or rather how it's invoked from a default method. When I replaced it with a call to an external, static method (that returned a random number), there was no difference between the abstract class and the default method on the interface. But as soon as you call `this.something()`, the default method becomes slower. – biziclop May 18 '15 at 21:53
  • 1
    And indeed `javap` reveals that while `PackedObject.defaultSum()` has to use an `invokeinterface` opcode when invoking `get()`, `AbstractPackedObject.sum()` can use `invokevirtual`. Not that surprising in hindsight. – biziclop May 18 '15 at 21:58
  • Have you looked at the assembly code of `benchSum` and `benchDefaultSum`? From what I understand, `benchDefaultSum` uses an invokevirtual on `defaultSum` and `benchSum` an invokevirtual on `get` – Nicolas Labrot May 18 '15 at 22:06
  • 1
    @NicolasLabrot That's true but I'm talking about the call from `defaultSum()` to `get()`. – biziclop May 18 '15 at 22:17
  • Agree, but `invokeinterface` in this case should not be a stopper for the inlining – Nicolas Labrot May 18 '15 at 22:29
  • 1
    Well, all I can say is this is what I measured. If you make any call that doesn't involve `invokeinterface` on self, the default method on the interface and a method in an abstract class perform identically. – biziclop May 18 '15 at 22:34
  • if inlining shows no differences then you'll have to look at the generated assembly code – the8472 May 19 '15 at 03:55
  • 1
    Use `-prof perfasm` to print the hot parts of generated code. You may want to consider simplifying the benchmark to make hot code even more dense and understandable. – Aleksey Shipilev May 19 '15 at 11:35

2 Answers2

5

Just regurgitating information that i've picked up by cursory reading of the hotspot-compiler-dev mailing list, but this may be the lack of class hierarchy analysis for default methods in interfaces, which prevents devirtualization of interface methods.

See JDK Bug 8065760 and 6986483


My guess is that even though the method is inlined it still is by preceded by a type guard that gets eliminated by CHA in the abstract case but not for the interface method.

Printing optimized assembly (i think JMH has some flag for that) could confirm that.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • 1
    This is not the case, since the output of `-XX:+PrintInlining` in the original question explicitly shows that the default method is successfully inlined. – apangin May 18 '15 at 23:55
  • 2
    it could still contain a type guard that CHA would eliminate, i think the need for type profiling information in his inling print suggests as much. – the8472 May 19 '15 at 03:57
  • Thanks for your input, I will look at home after work. Any hint on how look like a type guard in assembly? – Nicolas Labrot May 19 '15 at 06:44
  • 1
    I've already verified this is not the case as well. Type guard is a load of object's class + conditional branch (mov + cmp + jne). But here the generated code contains just a redundant null check with an unnecessary register move. – apangin May 19 '15 at 07:19
  • I have add to the question a snippet of the assembly codes of the both test – Nicolas Labrot May 20 '15 at 20:30
0

The other answer is outdated and no longer valid. In Java 11, default methods are inlined like virtual methods in abstract classes.

However, a megamorphic default method is twice as slow as a megamorphic virtual method in an abstract class.

spongebob
  • 8,370
  • 15
  • 50
  • 83