Performance difference between Java direct array index access vs. for loop access

Question

I was experimenting with predicates. I tried to implement the predicate for serializing issues in distributed systems. I wrote a simple example where the test function just returns true. I was measuring the overhead, and I stumbled upon this interesting problem. Accessing array in for loop is 10 times slower compared to direct access.

class Test {
    public boolean test(Object o) { return true; }
}

long count = 1000000000l;
Test[] test = new Test[3];
test[0] = new Test();
test[1] = new Test();
test[2] = new Test();
long milliseconds = System.currentTimeMillis();
for(int i = 0; i < count; i++){
    boolean result = true;
    Object object = new Object();
    for(int j = 0; j < test.length; j++){
        result = result && test[j].test(object);
    }
}
System.out.println((System.currentTimeMillis() - milliseconds));

However, the following code is almost 10 times faster. What can be the reason?

milliseconds = System.currentTimeMillis();
for(int i=0 ; i < count; i++) {
    Object object = new Object();
    boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
}
System.out.println((System.currentTimeMillis() - milliseconds));

Benchmark results on my i5.

4567 msec for for loop access
297 msec for direct access

Well, in the first piece of code you have an inner loop, which is always iterating 3 times per iteration of the outer loop. In the second piece of code, no such loop exists and the whole thing can short-circuit right away while the loop itself can't. — Crusha K. Rool, Nov 28 '16 at 18:49
Just to make things a little more comparable, how about changing the condition of the inner loop in the first piece of code to `!result && j < test.length` and initializing `boolean result = false` to have the same chance at short-circuiting? — Crusha K. Rool, Nov 28 '16 at 18:52
It looks to me like you've just got a bad microbenchmark and the optimizer is doing something to eliminate most of the work in the second snippet. — user2357112, Nov 28 '16 at 18:54
Place one approach into a method called `foo()`, and the other one into a method called `bar()`, and then call these methods, maybe 20 types, *alternatingly*, from your `main` method. For me, after a few calls, the methods will take **exactly the same amount of time**. (Starting with `java -server ...` might help). The first method may be a tag more complex, which may be a reason for the JIT to kick in a bit later, but in the end, there is no difference. — Marco13, Nov 28 '16 at 19:36
A side note: **This is not a proper microbenchmark**. In both cases, the `result` variaible is not used, and the compiler **could** basically eliminate the whole program. And the fact that the `result` variable is **read and written** in the first approach, but **only written** in the second one may be another reason of why it's easier for the optimizer to figure out that the whole code is effectively useless... — Marco13, Nov 28 '16 at 19:39

Trinimon · Answer 1 · 2016-11-28T20:22:44.153

Due to the predictable result of test(Object o) the compiler is able to optimize the second piece of code quite effectively. The second loop in the first piece of code makes this optimization impossible.

Compare the result with the following Test class:

static class Test {
    public boolean test(Object o) {
        return Math.random() > 0.5;
    }
}

... and the loops:

    long count = 100000000l;
    Test[] test = new Test[3];
    test[0] = new Test();
    test[1] = new Test();
    test[2] = new Test();

    long milliseconds = System.currentTimeMillis();

    for(int i = 0; i < count; i++){
        boolean result = true;
        Object object = new Object();
        for(int j = 0; j < test.length; j++){
            result = result && test[j].test(object);
        }
    }

    System.out.println((System.currentTimeMillis() - milliseconds));
    milliseconds = System.currentTimeMillis();

    for(int i=0 ; i < count; i++) {
        Object object = new Object();
        boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
    }

    System.out.println((System.currentTimeMillis() - milliseconds));

Now both loops require almost the same time:

run:
3759
3368
BUILD SUCCESSFUL (total time: 7 seconds)

p.s.: check out this article for more about JIT compiler optimizations.

While this is true, I'm not sure whether it is +1worthy: The crucial question is (or might be) why it manages to optimize the original code better in one case than in the other. (I made some guesses in the comments, no time for a proper answer right now) — Marco13, Nov 28 '16 at 19:41
I've to admit that it was rather my gut feeling leading me to the answer above. Compare for instance this post: http://stackoverflow.com/questions/7772864/would-java-inline-methods-during-optimization. The author of the accepted answer supposes that the (similar) optimization is done by the JIT compiler. Root cause is definitely _not_ the additional loop. — Trinimon, Nov 28 '16 at 20:04

Durandal · Answer 2 · 2016-11-28T21:24:47.443

You are committing almost every basic mistake you can make with a microbenchmark.

You don't ensure code cannot be optimized away by making sure to actually use the calculations result.
Your two code branches have subtly but decidedly different logic ~~(as pointed out variant two will always short-circuit)~~. The second case is easier to optimize for the JIT due to test() returning a constant.
You did not warm up the code, inviting JIT optimization time being included somewhere into the execution time
Your testing code is not accounting for execution order of test cases exerting an influence on the test results. Its not fair to run case 1, then case 2 with the same data and objects. The JIT will by the time case 2 runs have optimized the test method and collected runtime statistics about its behavior (at the expense of case 1's execution time).

Why does variant 2 short circuit? All of them are true and needs to be evaulated? Am I missing something? — Mustafa, Nov 28 '16 at 21:19
@Mustafa My statement is incorrect (must have gotten the idea from a comment), but the effect is real: since you return a constant from test(), the entire line expression becomes a constant after inlining by the JIT. — Durandal, Nov 28 '16 at 21:23

score 0 · Answer 3 · edited Nov 28 '16 at 19:16

If loop header takes one unit time to execute the in first solution loop header evaluations takes 3N units of time. While in direct access it takes N.

Other than loop header overhead in first solution 3 && conditions per iteration to evaluate while in second there are only 2.

And last but not the least Boolean short-circuit evaluation which causes your second, faster example, to stop testing the condition "prematurely", i.e. the entire result evaluates to false if first && condition results false.

Performance difference between Java direct array index access vs. for loop access

3 Answers3