Here is a combination of effects described in this and this answers.
Different results are explained by a different inlining tree. Lambda has one more level of indirection comparing to method reference, so during JIT compilation the expression with lambda may reach the inlining depth limit earlier. The default is -XX:MaxInlineLevel=9
.
Run the benchmark with -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
to see the the whole inlining tree. Here is what we get on JDK 8:
1563 560 4 bench.FindMaxInt::streamWithLambda (38 bytes)
@ 3 java.util.stream.IntPipeline::<init> (7 bytes) inline (hot)
@ 3 java.util.stream.AbstractPipeline::<init> (91 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 51 java.util.stream.StreamOpFlag::combineOpFlags (9 bytes) inline (hot)
@ 2 java.util.stream.StreamOpFlag::getMask (30 bytes) inline (hot)
@ 66 java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes) inline (hot)
@ 4 java.util.Collection::stream (11 bytes) inline (hot)
\-> TypeProfile (5120/5120 counts) = java/util/ArrayList
@ 1 java.util.ArrayList::spliterator (12 bytes) inline (hot)
@ 8 java.util.ArrayList$ArrayListSpliterator::<init> (26 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 7 java.util.stream.StreamSupport::stream (19 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 11 java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes) inline (hot)
@ 1 java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes) inline (hot)
\-> TypeProfile (5124/5124 counts) = java/util/ArrayList$ArrayListSpliterator
@ 15 java.util.stream.ReferencePipeline$Head::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.ReferencePipeline::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.AbstractPipeline::<init> (55 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 9 java.lang.invoke.LambdaForm$MH/883049899::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/1922154895::identity_L (8 bytes) force inline by annotation
@ 14 java.util.stream.ReferencePipeline::mapToInt (26 bytes) inline (hot)
\-> TypeProfile (5120/5120 counts) = java/util/stream/ReferencePipeline$Head
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 22 java.util.stream.ReferencePipeline$4::<init> (20 bytes) inline (hot)
@ 16 java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes) inline (hot)
@ 3 java.util.stream.IntPipeline::<init> (7 bytes) inline (hot)
@ 3 java.util.stream.AbstractPipeline::<init> (91 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 51 java.util.stream.StreamOpFlag::combineOpFlags (9 bytes) inline (hot)
@ 2 java.util.stream.StreamOpFlag::getMask (30 bytes) inline (hot)
@ 66 java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes) inline (hot)
@ 21 java.lang.invoke.LambdaForm$MH/883049899::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/1922154895::identity_L (8 bytes) force inline by annotation
@ 26 java.util.stream.IntPipeline::reduce (16 bytes) inline (hot)
\-> TypeProfile (5120/5120 counts) = java/util/stream/ReferencePipeline$4
@ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 14 java.util.stream.ReduceOps$5::<init> (16 bytes) inline (hot)
@ 12 java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) inline (hot)
@ 50 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 80 java.util.stream.TerminalOp::getOpFlags (2 bytes) inline (hot)
\-> TypeProfile (5130/5130 counts) = java/util/stream/ReduceOps$5
@ 85 java.util.stream.AbstractPipeline::sourceSpliterator (265 bytes) inline (hot)
@ 79 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) inline (hot)
@ 2 java.util.stream.ReduceOps$5::makeSink (5 bytes) inline (hot)
@ 1 java.util.stream.ReduceOps$5::makeSink (16 bytes) inline (hot)
@ 12 java.util.stream.ReduceOps$5ReducingSink::<init> (15 bytes) inline (hot)
@ 11 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.stream.AbstractPipeline::wrapAndCopyInto (18 bytes) inline (hot)
@ 3 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 9 java.util.stream.AbstractPipeline::wrapSink (37 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 23 java.util.stream.ReferencePipeline$4::opWrapSink (10 bytes) inline (hot)
\-> TypeProfile (5081/5081 counts) = java/util/stream/ReferencePipeline$4
@ 6 java.util.stream.ReferencePipeline$4$1::<init> (11 bytes) inline (hot)
@ 7 java.util.stream.Sink$ChainedReference::<init> (16 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 13 java.util.stream.AbstractPipeline::copyInto (53 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 9 java.util.stream.AbstractPipeline::getStreamAndOpFlags (5 bytes) accessor
@ 12 java.util.stream.StreamOpFlag::isKnown (19 bytes) inline (hot)
@ 20 java.util.Spliterator::getExactSizeIfKnown (25 bytes) inline (hot)
\-> TypeProfile (5081/5081 counts) = java/util/ArrayList$ArrayListSpliterator
@ 1 java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes) inline (hot)
@ 19 java.util.ArrayList$ArrayListSpliterator::estimateSize (11 bytes) inline (hot)
@ 1 java.util.ArrayList$ArrayListSpliterator::getFence (48 bytes) inline (hot)
@ 38 java.util.ArrayList::access$000 (5 bytes) accessor
@ 25 java.util.stream.Sink$ChainedReference::begin (11 bytes) inline (hot)
\-> TypeProfile (5081/5081 counts) = java/util/stream/ReferencePipeline$4$1
@ 5 java.util.stream.ReduceOps$5ReducingSink::begin (9 bytes) inline (hot)
\-> TypeProfile (5079/5079 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 32 java.util.ArrayList$ArrayListSpliterator::forEachRemaining (129 bytes) inline (hot)
@ 51 java.util.ArrayList::access$000 (5 bytes) accessor
@ 99 java.util.stream.ReferencePipeline$4$1::accept (23 bytes) inline (hot)
@ 12 bench.FindMaxInt$$Lambda$8/390011259::applyAsInt (8 bytes) inline (hot)
\-> TypeProfile (13752/13752 counts) = bench/FindMaxInt$$Lambda$8
@ 4 java.lang.Integer::intValue (5 bytes) accessor
@ 17 java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes) inline (hot)
\-> TypeProfile (13752/13752 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 10 bench.FindMaxInt$$Lambda$9/208515840::applyAsInt (6 bytes) inline (hot)
\-> TypeProfile (9107/9107 counts) = bench/FindMaxInt$$Lambda$9
@ 2 bench.FindMaxInt::lambda$streamWithLambda$0 (6 bytes) inline (hot)
@ 2 java.lang.Integer::max (6 bytes) inlining too deep
@ 38 java.util.stream.Sink$ChainedReference::end (10 bytes) inline (hot)
@ 4 java.util.stream.Sink::end (1 bytes) inline (hot)
\-> TypeProfile (5125/5125 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 12 java.util.stream.ReduceOps$5ReducingSink::get (5 bytes) inline (hot)
@ 1 java.util.stream.ReduceOps$5ReducingSink::get (8 bytes) inline (hot)
@ 4 java.lang.Integer::valueOf (32 bytes) inline (hot)
@ 28 java.lang.Integer::<init> (10 bytes) inline (hot)
@ 1 java.lang.Number::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 12 java.lang.Integer::intValue (5 bytes) accessor
@ 34 org.openjdk.jmh.infra.Blackhole::consume (28 bytes) disallowed by CompilerOracle
The key lines are the following. They mean the inlining breaks exactly at the final call to Integer.max
, because the default limit of 9 levels is reached.
@ 2 bench.FindMaxInt::lambda$streamWithLambda$0 (6 bytes) inline (hot)
@ 2 java.lang.Integer::max (6 bytes) inlining too deep
The shape of the inlining tree is very different on JDK 11:
1588 705 4 bench.FindMaxInt::streamWithLambda (38 bytes)
@ 4 java.util.Collection::stream (11 bytes) inline (hot)
\-> TypeProfile (5263/5263 counts) = java/util/ArrayList
@ 1 java.util.ArrayList::spliterator (12 bytes) inline (hot)
@ 8 java.util.ArrayList$ArrayListSpliterator::<init> (26 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 7 java.util.stream.StreamSupport::stream (19 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 11 java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes) inline (hot)
@ 1 java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes) inline (hot)
\-> TypeProfile (5125/5125 counts) = java/util/ArrayList$ArrayListSpliterator
@ 15 java.util.stream.ReferencePipeline$Head::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.ReferencePipeline::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.AbstractPipeline::<init> (55 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 9 java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/0x0000000800060440::invoke (8 bytes) force inline by annotation
@ 14 java.util.stream.ReferencePipeline::mapToInt (26 bytes) inline (hot)
\-> TypeProfile (5263/5263 counts) = java/util/stream/ReferencePipeline$Head
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 22 java.util.stream.ReferencePipeline$4::<init> (20 bytes) inline (hot)
@ 16 java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes) inline (hot)
@ 3 java.util.stream.IntPipeline::<init> (7 bytes) inline (hot)
@ 3 java.util.stream.AbstractPipeline::<init> (91 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 51 java.util.stream.StreamOpFlag::combineOpFlags (9 bytes) inline (hot)
@ 2 java.util.stream.StreamOpFlag::getMask (30 bytes) inline (hot)
@ 66 java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes) inline (hot)
@ 21 java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/0x0000000800060440::invoke (8 bytes) force inline by annotation
@ 26 java.util.stream.IntPipeline::reduce (16 bytes) inline (hot)
\-> TypeProfile (5263/5263 counts) = java/util/stream/ReferencePipeline$4
@ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 14 java.util.stream.ReduceOps$6::<init> (16 bytes) inline (hot)
@ 12 java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) inline (hot)
@ 50 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 80 java.util.stream.TerminalOp::getOpFlags (2 bytes) inline (hot)
\-> TypeProfile (5362/5362 counts) = java/util/stream/ReduceOps$6
@ 85 java.util.stream.AbstractPipeline::sourceSpliterator (265 bytes) inline (hot)
@ 79 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) already compiled into a big method
@ 12 java.lang.Integer::intValue (5 bytes) accessor
@ 34 org.openjdk.jmh.infra.Blackhole::consume (28 bytes) disallowed by CompileCommand
The compilation tree cuts off much earlier due to a different reason:
@ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) already compiled into a big method
The default garbage collector has changed to G1 in JDK 11. The compiled code appears larger due to G1 barriers, that's why the inlining heuristics prevented the hottest forEachRemaining
loop from inlining into the streamWithLambda
method.
In fact, this is not an optimization in JDK 11, but more like the other way round. However, the overall performance of this particular benchmark appeared better, since the inlining tree cutoff happened outside the hottest loop.
