2

I'm runnning some tests (very basic, nothing fancy) in order to check performance on Java 8 streams and lambdas. Using an ArrayList of 10 million POJOS, all I want to do is get the average value of a BigDecimal field. In order to take more than one sample, I run the process five times, and to my surprise the first of those five runs is extremely slower than the rest. I'm getting values like 0.38 seconds the first time, and 0.04 seconds on the other four. This is 10x faster!!! I also did the same test using old school for(Pojo p : pojos) with similar results. Why is this happening, and how can I take advantage of it? The code I'm using is:

for (int i = 0; i < 5; i++) {
    long init = System.nanoTime();
    BigDecimal sum = lista.parallelStream().map(x -> x.getCosto()).reduce(BigDecimal.ZERO, BigDecimal::add);
    BigDecimal avg = sum.divide(BigDecimal.valueOf(registros));
    long end = System.nanoTime();
    System.out.println("End of processing: " + avg + " in "
            + ((end - init) / 1000000000.0) + " seconds.");
}
Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
Emilio
  • 75
  • 2
  • 5
  • 3
    Unfortunately, making performance tests is not that easy, especially when combining with lambda expressions and method references. You need to use the proper tools for that, like the JMH framework. – Tunaki Mar 02 '16 at 22:11
  • 1
    I disagree with duplicate mark. OP asks why the first processing is slower. And it indeed is. Even if OP rewrites the benchmark using JMH, the first iteration will be much slower than the consequent ones. OPs measurement is not so bad methodologically, given the fact that we are speaking about dozens of milliseconds, not micro or nanoseconds. – Tagir Valeev Mar 03 '16 at 04:18

1 Answers1

4

There's a constant delay necessary to initialize the Stream API when you call it for the first time, which includes the following steps:

  • Loading of many helper classes from java.util.stream package
  • Loading lambda generating classes from java.lang.invoke package (like LambdaMetafactory).
  • Generating runtime representation for lambdas and method references involved into stream pipeline (including lambdas used internally in the Stream API).
  • Tiered compilation of all this byte code (Interpreter -> C1 JIT -> C2 JIT). C2 JIT compilation (which generates the fastest code) is triggered only after specific number of method invocation (like 5000) or after specific number of backedges (loop iterations if the method has big loop inside; like 40000). When most of code is not C2-compiled, it works much slower. Also JIT-compiler thread takes some CPU time which could be spent for actual computation.
  • For parallel streams: initialization of common ForkJoinPool, creating new threads.

All of these steps are performed only once. When you use Stream API again, most of this work is already done, so the consecutive launches are much faster.

In your particular case you are using the heap intensively, so heap enlargement could also be the cause of additional slowness. If your -Xms default value is too small, then Garbage collector performs several full-gc cycles until it enlarges heap to the comfortable size. You may run your test with Xms==Xmx (e.g. -Xmx1G -Xms1G) and this may improve the first iteration speed.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 1
    Maybe it’s worth adding the general answer to the “how can I take advantage of it?” part of the question: generally, avoid code duplication, create reusable classes, etc. – Holger Mar 03 '16 at 08:59