The curious case of the Java Integer Array streaming performance

Question

Here is a new attempt to present that curious difference in performance in the use of IntStream and reduce to do a computation on an integer array.

My question wasn't understood as legitimate. I was asked to do proper benchmark using JMH. So here it is.

In the following program, 2 methods do almost exactly the same processing. In the first version It does 3 identical computations 2 times, in the second version It does 2 identical computation 3 times.
In effect, both versions repeat the same computation 6 times.

package perf;

import org.openjdk.jmh.annotations.*;

import java.util.Random;
import java.util.stream.IntStream;

public class IntStreamBenchmark {

    final static int ARRAY_LENGTH = 10_000_000;

    @State(Scope.Benchmark)
    public static class TestBuddy {

        int[] as;
        int xor;

        @Setup(Level.Trial)
        public void init(){
            as = new int[ARRAY_LENGTH];
            xor = 0;
            Random random = new Random(0);
            for (int i = 0; i < ARRAY_LENGTH; i++) {
                as[i] = random.nextInt(1_000_000);
                xor ^= as[i];
            }
        }

        public int[] getArray(){
            return as;
        }

        public boolean checkResult(int result) {
            return result == xor;
        }
        
    }

    /** Run 2 reduce expresssions 3 times each */
    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @Fork(2)
    public void benchmark_2_3(TestBuddy testBuddy) {

        // repeat 3 times
        for (int k = 0; k < 3; k++) {
            int[] as = testBuddy.getArray();
            int result;
            // compute 2 stream reduce expressions
            result = IntStream.of(as).reduce(0, (x, y) -> x^y);
            result |= IntStream.of(as).reduce(0, (x, y) -> x^y);
            if( !testBuddy.checkResult(result) ){
                throw new RuntimeException("Calculation error.");
            }
        }
        
    }
    
    /** Run 3 reduce expresssions 2 times each */
    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @Fork(2)
    public void benchmark_3_2(TestBuddy testBuddy) {

        // repeat 2 times
        for (int k = 0; k < 2; k++) {
            int[] as = testBuddy.getArray();
            int result;
            // compute 3 stream reduce expressions
            result = IntStream.of(as).reduce(0, (x, y) -> x^y);
            result |= IntStream.of(as).reduce(0, (x, y) -> x^y);
            result |= IntStream.of(as).reduce(0, (x, y) -> x^y);
            if( !testBuddy.checkResult(result) ){
                throw new RuntimeException("Calculation error.");
            }
        }
        
    }
    
}

And here is the verdict from JMH:

# Run complete. Total time: 00:06:51

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                          Mode  Cnt   Score   Error  Units
IntStreamBenchmark.benchmark_2_3  thrpt   10  16.010 ± 0.196  ops/s
IntStreamBenchmark.benchmark_3_2  thrpt   10   1.305 ± 0.279  ops/s

You immediately see that the throughput went from 16 to 1.3. That is huge. I don't see an obvious reason for that.

My question is: what is the reason, in the working of hotspot or anywhere, that can create such a difference?

About the original program

Below you can find the program I posted initially. I was told it is a microbenchmark and that it was doing it "wrong". I don't agree with that.

That program is the last in a series of tests I did to understand the problem and characterize the circunstances when the effect happens.

Initilaly, I did fill the array with non-zero data, I did proper computations, I did check the result to make sure that it wasn't optimized out, I did run the code for a prolonged time to see if there is a warmup effect. All these tests convinced me that there is something wrong in the way the JVM optimizes this code.

Then, in an effort to identify the cause of the problem, I stripped the program down to its bare minimum. I removed all code that I could remove and still observe the drop in performance I had identified. I understand the resulting code is suspicious as a benchmark. But it was not intended to prove the effect, it was there to show the circumstances when the problem occurs.

I also wanted people to try the code on different machines, to see whether it is related to some peculiarity of my hardware, or whether it is more systematic. So far I have seen it happen everywhere I tried, on 2 computers and 4 versions of the JVM, up to openjdk 18.

Looking at he output of the original code, which is not visible in the benchmark above, you see more precisely what happens. One statement first runs perfectly well, in ~150 ms, but later consistently needs over 1000 ms to execute. I can imagine that hotspot identifies some repetition in the code and suddenly decides to optimize some code out. But I cannot see any explanation why hotspot would decide to switch to a more laborious way to execute the code. That would be a bug.

Original post

I was doing a benchmark on various methods to compute the max of an integer array. Some methods involved streams. But I couldn't get any meaningful result because the run times can change a lot for no logical reason.

Here is a stripped down version of my program.

package perf;

import java.util.Arrays;
import java.util.Random;
import java.util.function.Supplier;
import java.util.stream.IntStream;

public class MaxOfInts3 {

    public static void measure(String name, Supplier<Integer> computation, int samples){
        System.gc();
        for( int i = 0 ; i < samples ; i++ ){
            long t0 = System.currentTimeMillis();
            int result = computation.get();
            long t1 = System.currentTimeMillis();
            long time = t1 - t0;
            System.out.format("%-20s %d: %d, %d ms\n", name, i, result, time);
        }
        System.out.println();
    }

    public static void benchmark(int len){

        System.out.println("Create array of size " + len + "...");
        final int[] as = new int[len];
        Random rnd = new Random();
        for( int i = 0 ; i < len ; i++ ) {
            as[i] = rnd.nextInt(1_000_000);
        }

//        final int[] bs = new int[len];
//        Supplier<Integer> comp0 = () -> IntStream.of(bs).reduce((x,y) -> 0).orElse(0);
//        Supplier<Integer> comp1 = () -> Arrays.stream(bs).reduce((x,y) -> 1).orElse(1);
//        comp0.get();
//        comp1.get();

        measure("Compute max", () -> Arrays.stream(as).reduce((x,y) -> x > y ? x : y).getAsInt(), 3);
        measure("Compute min", () -> Arrays.stream(as).reduce((x,y) -> x < y ? x : y).getAsInt(), 3);
        measure("Compute max again", () -> Arrays.stream(as).reduce((x,y) -> x > y ? x : y).getAsInt(), 3);
        measure("Compute min again", () -> Arrays.stream(as).reduce((x,y) -> x < y ? x : y).getAsInt(), 3);

    }

    public static void main(String[] args) {
        benchmark(300_000_000);
    }
}

When I run the program as it is, I get the following output.

Create array of size 300000000...
Compute max          0: 999999, 173 ms
Compute max          1: 999999, 143 ms
Compute max          2: 999999, 124 ms

Compute min          0: 0, 176 ms
Compute min          1: 0, 156 ms
Compute min          2: 0, 125 ms

Compute max again    0: 999999, 1013 ms
Compute max again    1: 999999, 1232 ms
Compute max again    2: 999999, 1139 ms

Compute min again    0: 0, 1283 ms
Compute min again    1: 0, 1165 ms
Compute min again    2: 0, 1066 ms

As you can see, the same method called twice returns very different times between the first and the second call.

Further investigation shows that it doesn't matter which computations I run, the first 2 runs are fast, the subsequent runs are slow. Interestingly, I can call my Supplier many times, it doesn't change the result.

It seems to depend on the number of times I stream an integer array inside of a Supplier. If I define and use more than two Suppliers that work by streaming and reducing an integer array then everything becomes slow. For instance, If I uncomment the commented lines, all 4 benchmarks are slow.

For reference, I tried it on a MacBook Pro with Java 13.0.3 and on Windows 10 with Java 11.0.2. In both cases I see the same thing.

Do you people also see this behaviour? And is there a good explanation why this happens?

PS: Here is a different version of the benchmark method that is maybe more to the point.

public static void benchmark(int len){

    final int[] as = new int[len];

    Supplier<Integer> comp0 = () -> IntStream.of(as).reduce((x,y) -> 0).orElse(0);
    Supplier<Integer> comp1 = () -> IntStream.of(as).reduce((x,y) -> 1).orElse(1);
    Supplier<Integer> comp2 = () -> IntStream.of(as).reduce((x,y) -> 2).orElse(2);

    measure("comp0", comp0, 3);
    measure("comp1", comp1, 3);
    measure("comp0", comp0, 3);
    measure("comp1", comp1, 3);
    measure("comp2", comp2, 3);
    measure("comp0", comp0, 3);
    measure("comp1", comp1, 3);
    measure("comp2", comp2, 3);
}

Both of these benchmarks have systemic problems that mean that the results are unreliable. Please read [How do I write a **correct** micro-benchmark in Java?](https://stackoverflow.com/questions/504103) — Stephen C, Apr 18 '22 at 00:51
Closing the question prevents any kind of discussion. Maybe I shouldn't have started by described it as a benchmark. It is a peculiar behaviour of the JVM. My question was wether other people see the same thing and what in the JVM could explain this behaviour. It looks wrong that a fast method can suddenly become slow. I'd like to understand what makes it slow. — Florian F, Apr 18 '22 at 08:39
Same effect is on my PC. Too bad it was closed as this is really interesting. It doesnt look like any testing bias effect. Maybe you can come back with this question and more canonical benchmarking? — mpdgr, Apr 19 '22 at 00:40
@mpdgr. Thanks. I edited the question. See what happens. Maybe I'll have to repost it as a fresh question. — Florian F, Apr 22 '22 at 23:05
I filed a bug with Oracle. https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8286275 — Florian F, May 06 '22 at 15:13

score 0 · Answer 1 · answered Apr 18 '22 at 00:31

0

This isn't how to microbenchmark java code - The JVM runs code very slowly and does all sorts of bookkeeping to boot... until it realizes some method is being run a lot and will then recompile it to finely tuned machine code, using that bookkeeping. Also, your CPU is doing other stuff (such as switching that song you're playing or whatever) - that needs to be isolated. As a consequence, you need to do a lot of very tricky work to 'microbenchmark' code.

Fortunately, it's been done for you. Use Java Microbenchmark Harness.

answered Apr 18 '22 at 00:31

rzwitserloot

85,357
5
51
72

This wasn't intended to be a benchmark measuring subtle difference in performance. It is about a glaring drop in performance I wanted to understand. – Florian F Apr 18 '22 at 19:04
.. no, you're microbenchmarking, and doing that wrong. Rewrite this test using JMH, reading the docs to know exactly what you can and cannot do (example: You have to do something with the calculation result, otherwise the JVM might optimize the entire algorithm out. The JMH manual covers all this). If JMH shows you a precipitous drop in performance, then ask another SO question with your JMH code. With your attempt, any weirdness you observe is likely explained by 'well, its the JVM, hotspot is involved, you can't draw any further conclusions than that'. – rzwitserloot Apr 18 '22 at 20:32
To be clear, just hotspot alone can cause _massive_ performance differences. Hence, you _must_ isolate it out in order to say anything useful. – rzwitserloot Apr 18 '22 at 20:32

The curious case of the Java Integer Array streaming performance

1 Answers1