Consider the following code:
public class Playground {
private static final int MAX = 100_000_000;
public static void main(String... args) {
execute(() -> {});
execute(() -> {});
execute(() -> {});
execute(() -> {});
}
public static void execute(Runnable task) {
Stopwatch stopwatch = Stopwatch.createStarted();
for (int i = 0; i < MAX; i++) {
task.run();
}
System.out.println(stopwatch);
}
}
This currently prints the following on my Intel MBP on Temurin 17:
3.675 ms
1.948 ms
216.9 ms
243.3 ms
Notice the 100* slowdown for the third (and any subsequent) execution. Now, obviously, this is NOT how to write benchmarks in Java. The loop code doesn't do anything, so I'd expect it to be eliminated for all and any repetitions. Also I could not repeat this effect using JMH which tells me the reason is tricky and fragile.
So, why does this happen? Why would there suddenly be such a catastrophic slowdown, what's going on under the hood? An assumption is that C2 bails on us, but which limitation are we bumping into?
Things that don't change the behavior:
- using anonymous inner classes instead of lambdas,
- using 3+ different nested classes instead of lambdas.
Things that "fix" the behavior. Actually the third invocation and all subsequent appear to be much faster, hinting that compilation correctly eliminated the loops completely:
- using 1-2 nested classes instead of lambdas,
- using 1-2 lambda instances instead of 4 different ones,
- not calling
task.run()
lambdas inside the loop, - inlining the
execute()
method, still maintaining 4 different lambdas.