Code faster with unnecessary conditional?

Question

I have the following java function, to count common elements in two sorted arrays. Note the line with the question mark.

public static int overlap(int[] a, int[] b)
{
    int i = 0, j = 0;

    int res = 0;

    while(i < a.length && j < b.length)
    {
        if(a[i] > b[j])
            j ++;
        else if(a[i] < b[j])
            i ++;
        else if(a[i] == b[j])  // ?
        {
            res ++;
            i ++;
            j ++;
        }
    }

    return res;
}

Clearly, that last if-statement doesn't need to be there: at that point we know that the two values are equal. However, when I test the speed (I was checking whether the order of the checks made any difference), the method with the superfluous check is invariably faster than the one without (sometimes by a factor of two).

What is going on here? Some mysterious optimization? Am I overlooking something obvious? I'm using the stand compiler, version 1.8. I can't read bytecode, so I don't know what's going on under the hood.

Here is a full test class: https://gist.github.com/pbloem/1523283211454ec58ce9c5b45204eebd

Bytecode: https://gist.github.com/pbloem/ce4f6758f0bb1424c155c26e83ca88a1

I think it would be helpful if you tagged the language. It looks like Java to me. — , Sep 23 '16 at 15:40
_Possibly_ related: [Branch Prediction](http://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array) — noahnu, Sep 23 '16 at 15:42
Be very, very, very wary of micro benchmarks in Java: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java See point (8) in particular. Use something like JMH to get a useful benchmark and see if the differences persist. — ipsi, Sep 23 '16 at 16:32

huseyin tugrul buyukisik · Answer 1 · 2016-09-23T16:01:13.650

2

Possibly JIT swapping order of "if"s to get best performance but cannot swap order of just "else"(without an "if") with another "if" at the beginning, so when you added "if" after "else", it tried it as a first check, and if array overlapping is like %90, then it could keep that last "if" at the first place.

    if(a[i] == b[j])  // say %70 probability after N iterations
    {                 // or less randomized branching
        res ++;
        i ++;
        j ++;
    }
    else if(a[i] > b[j]) // %20 probability, medium branching
        j ++;
    else if(a[i] < b[j]) // %10, totally random branching, not predictable
        i ++;

is possible when arrays ordering

a > b or b < a

are more randomized than arrays overlapping.

When there is if+if+else, JIT may not predict what you mean in that. By adding an equalty, you are elliminating many cases except the equation.

Could you please try with an ordered array and totally randomized array?

Or it is just helping cpu's branch predictor as @noahnu said in comments.

edited Sep 23 '16 at 16:01

answered Sep 23 '16 at 15:45

huseyin tugrul buyukisik

11,469
4
45
97

I don't think the JIT can re-order this due to the function parameters being accessible to other threads. – Sep 23 '16 at 15:53
But there isn't any explicit synchronization in any case. Two codes should act same for all i and j values. – huseyin tugrul buyukisik Sep 23 '16 at 15:56
Hogwash!! What if the caller to this function had just launched a thread that changed `a` and `b`? – Sep 23 '16 at 15:57
@HorseFace I don't see anything related to multithreading in the example he gave. – huseyin tugrul buyukisik Sep 23 '16 at 15:59
Neither do I. But neither will the compiler when it sees that function. But it doesn't mean that it's not lurking about elsewhere. – Sep 23 '16 at 16:01
1

@HorseFace Unless there is `volatile` or something like this, the compiler is free to ignore other threads. It may fetch the values just once and put them into registers, just like there were local variables. – maaartinus Sep 23 '16 at 16:05
Might I wish you luck in putting a whole array into a register! If `a` and `b` were regular value copy POD types then I'd agree with you. – Sep 23 '16 at 16:07
16x YMM registers could hold 128 elements but I may be wrong. Also L1 is enough for that array. – huseyin tugrul buyukisik Sep 23 '16 at 16:09
It's a bit difficult to see whether non-sorted arrays make a difference, since the method behaves differently if the arrays aren't sorted. Still, it looks like the version with the superfluous check is still faster. – Peter Sep 23 '16 at 16:11
@Peter let two arrays be totally random themselves but their i-th elements same. This should increase the gap between superfluous check and the other. – huseyin tugrul buyukisik Sep 23 '16 at 16:16

score -1 · Answer 2 · answered Sep 23 '16 at 16:12

By using System.currentTimeMillis() you cant get exact elapsed time of program as sometimes it can be wrong due to JIT optimization. Try to use System.nanoTime().

Also make the variables used in time calculation as global variable for proper micro benchmark.

double sumWith = 0.0; double sumWithout = 0.0;

Code faster with unnecessary conditional?

2 Answers2