I belive you are not using a proper microbenchmark setting. You are comparing the warmup of the bytecode instrumentation framework
(ASM which is used to generated the lambda bytecode at runtime) + lambda execution time
with the execution time of the loop
.
Check this answer for performance-difference-between-java-8-lambdas-and-anonymous-inner-classes and the linked document. The linked document has a deep insight about the processing under the hood.
edit To provide a small snippet to demonstrate the above.
public class Warmup {
static int dummy;
static void merge(String s) {
dummy += s.length();
dummy++;
dummy -= s.length();
}
public static void main(String[] args) throws IOException {
List<String> list1 = new ArrayList<>();
Random rand = new Random(1);
for (int i = 0; i < 100_000; i++) {
list1.add(Long.toString(rand.nextLong()));
}
// this will boostrap the bytecode instrumentation
// Stream.of("foo".toCharArray()).forEach(System.out::println);
long start = System.nanoTime();
list1.forEach(data -> merge(data));
long end = System.nanoTime();
System.out.printf("duration: %d%n", end - start);
System.out.println(dummy);
}
}
If you run the code as it is posted the printed duration on my machine is
duration: 71694425
If you uncomment the line Stream.of(...
(which is only there to use the the bytecode instrumentation framework the first time) the printed duration is
duration: 7516086
Which is only around 10% of the initial run.
note Only to be explicit. Don't use benchmarks like the above. Have a look at jmh for such a requirement.