12

I ran some JMH-tests on lambda vs method reference, looking similar to:

IntStream......reduce(Integer::max)
vs.
IntSream.......reduce((i1, i2) -> Integer.max(i1, i2))

What I noticed was that the method reference performed about 5 times as fast as compared to the lambda, in Java 8. When i ran the test in Java 11 the execution time of the both approaches were about as fast as the method reference was in Java 8. So no major difference in performance between lambda and method reference in Java 11.

My question is: What improvement(s) have been made from Java 8 to 11 to boost this performance? I'm using OpenJDK.

EDIT My benchmark:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(value = 1, jvmArgs = {"-XX:CompileThreshold=5000"})
@Warmup(iterations = 2)
public class FindMaxInt {

@Param({"10000", "1000000", "10000000"})
private int n;

private List<Integer> data;

@Setup
public void setup(){
    data = createData();
}

@Benchmark
public void streamWithMethodReference(final Blackhole blackhole){
    int max = data.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, Integer::max);
    blackhole.consume(max);
}

@Benchmark
public void streamWithLambda(final Blackhole blackhole){
    int max = data.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, (i1, i2) -> Integer.max(i1, i2));
    blackhole.consume(max);
}
Johan Wiström
  • 373
  • 1
  • 5
  • 13
  • 1
    Please show your benchmark. I could not reproduce the effect you are talking about. – apangin Mar 02 '19 at 13:56
  • 1
    Lambda has one more level of indirection comparing to method reference, so during JIT compilation the expression with lambda may reach the inlining depth limit earlier. Try rerunning your test with `-XX:MaxInlineLevel=20` – apangin Mar 02 '19 at 14:09
  • 1
    I still do not understand how java8 and 11 exhibited so different result. I editet my post and added the benchmark @apangin – Johan Wiström Mar 03 '19 at 08:05

1 Answers1

23

Here is a combination of effects described in this and this answers.

Different results are explained by a different inlining tree. Lambda has one more level of indirection comparing to method reference, so during JIT compilation the expression with lambda may reach the inlining depth limit earlier. The default is -XX:MaxInlineLevel=9.

Run the benchmark with -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining to see the the whole inlining tree. Here is what we get on JDK 8:

1563  560       4       bench.FindMaxInt::streamWithLambda (38 bytes)
                           @ 3   java.util.stream.IntPipeline::<init> (7 bytes)   inline (hot)
                             @ 3   java.util.stream.AbstractPipeline::<init> (91 bytes)   inline (hot)
                               @ 1   java.util.stream.PipelineHelper::<init> (5 bytes)   inline (hot)
                                 @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                               @ 51   java.util.stream.StreamOpFlag::combineOpFlags (9 bytes)   inline (hot)
                                 @ 2   java.util.stream.StreamOpFlag::getMask (30 bytes)   inline (hot)
                               @ 66   java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes)   inline (hot)
                           @ 4   java.util.Collection::stream (11 bytes)   inline (hot)
                            \-> TypeProfile (5120/5120 counts) = java/util/ArrayList
                             @ 1   java.util.ArrayList::spliterator (12 bytes)   inline (hot)
                               @ 8   java.util.ArrayList$ArrayListSpliterator::<init> (26 bytes)   inline (hot)
                                 @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                             @ 7   java.util.stream.StreamSupport::stream (19 bytes)   inline (hot)
                               @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                               @ 11   java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes)   inline (hot)
                                 @ 1   java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes)   inline (hot)
                                  \-> TypeProfile (5124/5124 counts) = java/util/ArrayList$ArrayListSpliterator
                               @ 15   java.util.stream.ReferencePipeline$Head::<init> (8 bytes)   inline (hot)
                                 @ 4   java.util.stream.ReferencePipeline::<init> (8 bytes)   inline (hot)
                                   @ 4   java.util.stream.AbstractPipeline::<init> (55 bytes)   inline (hot)
                                     @ 1   java.util.stream.PipelineHelper::<init> (5 bytes)   inline (hot)
                                       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                           @ 9   java.lang.invoke.LambdaForm$MH/883049899::linkToTargetMethod (8 bytes)   force inline by annotation
                             @ 4   java.lang.invoke.LambdaForm$MH/1922154895::identity_L (8 bytes)   force inline by annotation
                           @ 14   java.util.stream.ReferencePipeline::mapToInt (26 bytes)   inline (hot)
                            \-> TypeProfile (5120/5120 counts) = java/util/stream/ReferencePipeline$Head
                             @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                             @ 22   java.util.stream.ReferencePipeline$4::<init> (20 bytes)   inline (hot)
                               @ 16   java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes)   inline (hot)
                                 @ 3   java.util.stream.IntPipeline::<init> (7 bytes)   inline (hot)
                                   @ 3   java.util.stream.AbstractPipeline::<init> (91 bytes)   inline (hot)
                                     @ 1   java.util.stream.PipelineHelper::<init> (5 bytes)   inline (hot)
                                       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                                     @ 51   java.util.stream.StreamOpFlag::combineOpFlags (9 bytes)   inline (hot)
                                       @ 2   java.util.stream.StreamOpFlag::getMask (30 bytes)   inline (hot)
                                     @ 66   java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes)   inline (hot)
                           @ 21   java.lang.invoke.LambdaForm$MH/883049899::linkToTargetMethod (8 bytes)   force inline by annotation
                             @ 4   java.lang.invoke.LambdaForm$MH/1922154895::identity_L (8 bytes)   force inline by annotation
                           @ 26   java.util.stream.IntPipeline::reduce (16 bytes)   inline (hot)
                            \-> TypeProfile (5120/5120 counts) = java/util/stream/ReferencePipeline$4
                             @ 3   java.util.stream.ReduceOps::makeInt (18 bytes)   inline (hot)
                               @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                               @ 14   java.util.stream.ReduceOps$5::<init> (16 bytes)   inline (hot)
                                 @ 12   java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes)   inline (hot)
                                   @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                             @ 6   java.util.stream.AbstractPipeline::evaluate (94 bytes)   inline (hot)
                               @ 50   java.util.stream.AbstractPipeline::isParallel (8 bytes)   inline (hot)
                               @ 80   java.util.stream.TerminalOp::getOpFlags (2 bytes)   inline (hot)
                                \-> TypeProfile (5130/5130 counts) = java/util/stream/ReduceOps$5
                               @ 85   java.util.stream.AbstractPipeline::sourceSpliterator (265 bytes)   inline (hot)
                                 @ 79   java.util.stream.AbstractPipeline::isParallel (8 bytes)   inline (hot)
                               @ 88   java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes)   inline (hot)
                                 @ 2   java.util.stream.ReduceOps$5::makeSink (5 bytes)   inline (hot)
                                   @ 1   java.util.stream.ReduceOps$5::makeSink (16 bytes)   inline (hot)
                                     @ 12   java.util.stream.ReduceOps$5ReducingSink::<init> (15 bytes)   inline (hot)
                                       @ 11   java.lang.Object::<init> (1 bytes)   inline (hot)
                                 @ 6   java.util.stream.AbstractPipeline::wrapAndCopyInto (18 bytes)   inline (hot)
                                   @ 3   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                                   @ 9   java.util.stream.AbstractPipeline::wrapSink (37 bytes)   inline (hot)
                                     @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                                     @ 23   java.util.stream.ReferencePipeline$4::opWrapSink (10 bytes)   inline (hot)
                                      \-> TypeProfile (5081/5081 counts) = java/util/stream/ReferencePipeline$4
                                       @ 6   java.util.stream.ReferencePipeline$4$1::<init> (11 bytes)   inline (hot)
                                         @ 7   java.util.stream.Sink$ChainedReference::<init> (16 bytes)   inline (hot)
                                           @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                                           @ 6   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                                   @ 13   java.util.stream.AbstractPipeline::copyInto (53 bytes)   inline (hot)
                                     @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                                     @ 9   java.util.stream.AbstractPipeline::getStreamAndOpFlags (5 bytes)   accessor
                                     @ 12   java.util.stream.StreamOpFlag::isKnown (19 bytes)   inline (hot)
                                     @ 20   java.util.Spliterator::getExactSizeIfKnown (25 bytes)   inline (hot)
                                      \-> TypeProfile (5081/5081 counts) = java/util/ArrayList$ArrayListSpliterator
                                       @ 1   java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes)   inline (hot)
                                       @ 19   java.util.ArrayList$ArrayListSpliterator::estimateSize (11 bytes)   inline (hot)
                                         @ 1   java.util.ArrayList$ArrayListSpliterator::getFence (48 bytes)   inline (hot)
                                           @ 38   java.util.ArrayList::access$000 (5 bytes)   accessor
                                     @ 25   java.util.stream.Sink$ChainedReference::begin (11 bytes)   inline (hot)
                                      \-> TypeProfile (5081/5081 counts) = java/util/stream/ReferencePipeline$4$1
                                       @ 5   java.util.stream.ReduceOps$5ReducingSink::begin (9 bytes)   inline (hot)
                                        \-> TypeProfile (5079/5079 counts) = java/util/stream/ReduceOps$5ReducingSink
                                     @ 32   java.util.ArrayList$ArrayListSpliterator::forEachRemaining (129 bytes)   inline (hot)
                                       @ 51   java.util.ArrayList::access$000 (5 bytes)   accessor
                                       @ 99   java.util.stream.ReferencePipeline$4$1::accept (23 bytes)   inline (hot)
                                         @ 12   bench.FindMaxInt$$Lambda$8/390011259::applyAsInt (8 bytes)   inline (hot)
                                          \-> TypeProfile (13752/13752 counts) = bench/FindMaxInt$$Lambda$8
                                           @ 4   java.lang.Integer::intValue (5 bytes)   accessor
                                         @ 17   java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes)   inline (hot)
                                          \-> TypeProfile (13752/13752 counts) = java/util/stream/ReduceOps$5ReducingSink
                                           @ 10   bench.FindMaxInt$$Lambda$9/208515840::applyAsInt (6 bytes)   inline (hot)
                                            \-> TypeProfile (9107/9107 counts) = bench/FindMaxInt$$Lambda$9
                                             @ 2   bench.FindMaxInt::lambda$streamWithLambda$0 (6 bytes)   inline (hot)
                                               @ 2   java.lang.Integer::max (6 bytes)   inlining too deep
                                     @ 38   java.util.stream.Sink$ChainedReference::end (10 bytes)   inline (hot)
                                       @ 4   java.util.stream.Sink::end (1 bytes)   inline (hot)
                                        \-> TypeProfile (5125/5125 counts) = java/util/stream/ReduceOps$5ReducingSink
                                 @ 12   java.util.stream.ReduceOps$5ReducingSink::get (5 bytes)   inline (hot)
                                   @ 1   java.util.stream.ReduceOps$5ReducingSink::get (8 bytes)   inline (hot)
                                     @ 4   java.lang.Integer::valueOf (32 bytes)   inline (hot)
                                       @ 28   java.lang.Integer::<init> (10 bytes)   inline (hot)
                                         @ 1   java.lang.Number::<init> (5 bytes)   inline (hot)
                                           @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                             @ 12   java.lang.Integer::intValue (5 bytes)   accessor
                           @ 34   org.openjdk.jmh.infra.Blackhole::consume (28 bytes)   disallowed by CompilerOracle

The key lines are the following. They mean the inlining breaks exactly at the final call to Integer.max, because the default limit of 9 levels is reached.

@ 2   bench.FindMaxInt::lambda$streamWithLambda$0 (6 bytes)   inline (hot)
  @ 2   java.lang.Integer::max (6 bytes)   inlining too deep

The shape of the inlining tree is very different on JDK 11:

1588  705       4       bench.FindMaxInt::streamWithLambda (38 bytes)
                           @ 4   java.util.Collection::stream (11 bytes)   inline (hot)
                            \-> TypeProfile (5263/5263 counts) = java/util/ArrayList
                             @ 1   java.util.ArrayList::spliterator (12 bytes)   inline (hot)
                               @ 8   java.util.ArrayList$ArrayListSpliterator::<init> (26 bytes)   inline (hot)
                                 @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)
                             @ 7   java.util.stream.StreamSupport::stream (19 bytes)   inline (hot)
                               @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                               @ 11   java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes)   inline (hot)
                                 @ 1   java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes)   inline (hot)
                                  \-> TypeProfile (5125/5125 counts) = java/util/ArrayList$ArrayListSpliterator
                               @ 15   java.util.stream.ReferencePipeline$Head::<init> (8 bytes)   inline (hot)
                                 @ 4   java.util.stream.ReferencePipeline::<init> (8 bytes)   inline (hot)
                                   @ 4   java.util.stream.AbstractPipeline::<init> (55 bytes)   inline (hot)
                                     @ 1   java.util.stream.PipelineHelper::<init> (5 bytes)   inline (hot)
                                       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                           @ 9   java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes)   force inline by annotation
                             @ 4   java.lang.invoke.LambdaForm$MH/0x0000000800060440::invoke (8 bytes)   force inline by annotation
                           @ 14   java.util.stream.ReferencePipeline::mapToInt (26 bytes)   inline (hot)
                            \-> TypeProfile (5263/5263 counts) = java/util/stream/ReferencePipeline$Head
                             @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                             @ 22   java.util.stream.ReferencePipeline$4::<init> (20 bytes)   inline (hot)
                               @ 16   java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes)   inline (hot)
                                 @ 3   java.util.stream.IntPipeline::<init> (7 bytes)   inline (hot)
                                   @ 3   java.util.stream.AbstractPipeline::<init> (91 bytes)   inline (hot)
                                     @ 1   java.util.stream.PipelineHelper::<init> (5 bytes)   inline (hot)
                                       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                                     @ 51   java.util.stream.StreamOpFlag::combineOpFlags (9 bytes)   inline (hot)
                                       @ 2   java.util.stream.StreamOpFlag::getMask (30 bytes)   inline (hot)
                                     @ 66   java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes)   inline (hot)
                           @ 21   java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes)   force inline by annotation
                             @ 4   java.lang.invoke.LambdaForm$MH/0x0000000800060440::invoke (8 bytes)   force inline by annotation
                           @ 26   java.util.stream.IntPipeline::reduce (16 bytes)   inline (hot)
                            \-> TypeProfile (5263/5263 counts) = java/util/stream/ReferencePipeline$4
                             @ 3   java.util.stream.ReduceOps::makeInt (18 bytes)   inline (hot)
                               @ 1   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
                               @ 14   java.util.stream.ReduceOps$6::<init> (16 bytes)   inline (hot)
                                 @ 12   java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes)   inline (hot)
                                   @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                             @ 6   java.util.stream.AbstractPipeline::evaluate (94 bytes)   inline (hot)
                               @ 50   java.util.stream.AbstractPipeline::isParallel (8 bytes)   inline (hot)
                               @ 80   java.util.stream.TerminalOp::getOpFlags (2 bytes)   inline (hot)
                                \-> TypeProfile (5362/5362 counts) = java/util/stream/ReduceOps$6
                               @ 85   java.util.stream.AbstractPipeline::sourceSpliterator (265 bytes)   inline (hot)
                                 @ 79   java.util.stream.AbstractPipeline::isParallel (8 bytes)   inline (hot)
                               @ 88   java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes)   already compiled into a big method
                             @ 12   java.lang.Integer::intValue (5 bytes)   accessor
                           @ 34   org.openjdk.jmh.infra.Blackhole::consume (28 bytes)   disallowed by CompileCommand

The compilation tree cuts off much earlier due to a different reason:

@ 88   java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes)   already compiled into a big method

The default garbage collector has changed to G1 in JDK 11. The compiled code appears larger due to G1 barriers, that's why the inlining heuristics prevented the hottest forEachRemaining loop from inlining into the streamWithLambda method.

In fact, this is not an optimization in JDK 11, but more like the other way round. However, the overall performance of this particular benchmark appeared better, since the inlining tree cutoff happened outside the hottest loop.

Inlining tree

apangin
  • 92,924
  • 10
  • 193
  • 247
  • What do you mean saying 'hottest loop'? – Thomas Banderas Mar 06 '19 at 22:12
  • 3
    @ThomasBanderas The loop in the code where the most cpu time i spent - the loop which [iterates](http://hg.openjdk.java.net/jdk/jdk/file/cd701366fcf8/src/java.base/share/classes/java/util/ArrayList.java#l1652) over the elements of the stream. – apangin Mar 07 '19 at 07:58
  • After reading some of the linked items, am I to understand (not made clear here) that the JIT has _already optimised_ the hot loop in both cases but Java 8 then de-optimised it by individually compiling _some_ of the hot methods leaving the (hottest) `Integer.max` now uncompiled? And that Java 11 _accidentally_ doesn't do this because of a G1 quirk means it leaves the original compilation untouched? – drekbour Nov 17 '19 at 18:47
  • @drekbour Deoptimization is irrelevant here. Also, everything is compiled here. `Integer.max` is not *inlined*, which means a different thing than *uncompiled*. – apangin Nov 17 '19 at 19:27
  • Understood s/compile/inline/g but what is the reason Java 11 remained fast in this case? (also: was this tiered compilation at work?) – drekbour Nov 17 '19 at 20:39
  • @drekbour Inlining. In JDK 11 the method `forEachRemaining` was compiled into a single unit. The loop compiled by JDK 8 has a method call inside (which implies overhead). – apangin Nov 17 '19 at 21:00