0

Using JMH 1.20, I have the following benchmark class to measure the performance differences of summing integers via:

  • traditional for loop
  • IntStream
  • IntStream parallel

Here is the java class:

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.stream.IntStream;

@Fork(1)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@State(Scope.Benchmark)
public class SumBenchmark {

    @Param({"10000", "100000", "1000000", "10000000"})
    private int n;

    @Benchmark
    public int sumForLoop() {
        int sum = 0;
        for (int i = 0; i < n; i++) {
            sum += i;
        }
        return sum;
    }

    @Benchmark
    public int sumIntStream() {
        return IntStream.range(0, n).sum();
    }

    @Benchmark
    public int sumIntStreamParallel() {
        return IntStream.range(0, n).parallel().sum();
    }

    public static void main(String[] args) throws Exception {
        Options options = new OptionsBuilder()
                .include(SumBenchmark.class.getName())
                .build();
        new Runner(options).run();
    }
}

Here is the result (rearranged for readability) after running on JDK 1.8.0_131, macOS 10.13.3, 2.4 GHz Intel Core 2 Duo:

Benchmark                               (n)   Mode  Cnt       Score      Error  Units

SumBenchmark.sumForLoop               10000  thrpt   10  160635.169 ± 9716.597  ops/s
SumBenchmark.sumIntStream             10000  thrpt   10   23270.515 ±  388.520  ops/s
SumBenchmark.sumIntStreamParallel     10000  thrpt   10   64018.729 ± 3424.532  ops/s

SumBenchmark.sumForLoop              100000  thrpt   10   17314.287 ±  238.582  ops/s
SumBenchmark.sumIntStream            100000  thrpt   10    1035.085 ±   16.062  ops/s
SumBenchmark.sumIntStreamParallel    100000  thrpt   10   24411.996 ±  255.924  ops/s

SumBenchmark.sumForLoop             1000000  thrpt   10    1846.943 ±   60.891  ops/s
SumBenchmark.sumIntStream           1000000  thrpt   10     104.376 ±    3.396  ops/s
SumBenchmark.sumIntStreamParallel   1000000  thrpt   10     323.331 ±  691.844  ops/s

SumBenchmark.sumForLoop            10000000  thrpt   10     184.656 ±    6.381  ops/s
SumBenchmark.sumIntStream          10000000  thrpt   10     189.411 ±    3.718  ops/s
SumBenchmark.sumIntStreamParallel  10000000  thrpt   10      18.006 ±    0.503  ops/s

I have some questions regarding this:

  • Why is sumForLoop so much faster than sumIntStream when n <= 1000000?
  • Why is sumIntStreamParallel so slow when n == 10000000?
  • Most importantly can you provide information or resources on how I can learn to analyze this myself to fully understand the output?

Thanks.

  • A good compiler will optimize sum(i, i=0..n) to the closed-form `n*(n+1)/2`. (Some C compilers know that trick, see Matt Godbolt's CppCon2017 talk: [“What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid"](https://youtu.be/bSkpMdDe4g4). Perhaps the JIT compiler is doing that, or it's doing something slow involving memory for the other ones. Actually we can rule out the closed form, because the time does scale with `n`. So presumably the range-creation doesn't optimize away entirely for the others? Your data is really noisy, though. Garbage collection? – Peter Cordes Feb 09 '18 at 09:33
  • 1
    The same problem as [here](https://stackoverflow.com/questions/25847397/erratic-performance-of-arrays-stream-map-sum). Rerun the benchmark with `-XX:MaxInlineLevel=20` – apangin Feb 09 '18 at 23:27

0 Answers0