Is there a performance difference between cascading if-else statements and nested if statements

Question

Is there a performance difference between cascading if-else statements like

if (i > c20) {
// ...
} else if (i > c19) {
// ...
} else if (i > c18) {
// ...
} else if (i > c17) {
// ...
} else if (i > c16) {
// ...
} else if (i > c15) {
// ...
} else if (i > c14) {
// ...
} else if (i > c13) {
// ...
} else if (i > c12) {
// ...
} else if (i > c11) {
// ...
} else if (i > c10) {
// ...
} else if (i > c9) {
// ...
} else if (i > c8) {
// ...
} else if (i > c7) {
// ...
} else if (i > c6) {
// ...
} else if (i > c5) {
// ...
} else if (i > c4) {
// ...
} else if (i > c3) {
// ...
} else if (i > c2) {
// ...
} else if (i > c1) {
// ...
} else if (i > c0) {
// ...
} else {
// ...
}

and nested if statements like:

if (i > c10) {
    if (i > c15) {
        if (i > c18) {
            if (i > c19) {
                if (i > c20) {
                    // ...
                } else {
                    // ...
                }
            } else {
                //...
            }
        } else {
            if (i > c17) {
                // ...
            } else {
                // ...
            }
        }
    } else {
        if (i > c13) {
            if (i > c14) {
                // ...
            } else {
                // ...
            }
        } else {
            if (i > c12) {
                // ...
            } else {
                // ...
            }
        }
    }
} else {
    if (i > c5) {
        if (i > c8) {
            if (i > c9) {
                //...
            } else {
                //...
            }
        } else {
            if (i > c7) {
                // ...
            } else {
                // ...
            }
        }
    } else {
        if (i > c3) {
            if (i > c4) {
                // ...
            } else {
                // ...
            }
        } else {
            if (i > c2) {
                // ...
            } else {
                if (i > c0) {
                    if (i > c1) {
                        // ...
                    }
                } else {
                    // ...
                }
            }
        }
    }
}

If there is a difference what is the reason one is faster than the other? Can one form result in: better JIT compilation, better cache strategy, better branch prediction, better compiler optimisation, etc.? I am particularly interested in the performance in Java but would be interested in knowing who it might be similar or different in other languages like C/C++, C#, etc.

How would different distributions of i, the ranges checked and/or a different number of if statements affect the results?

Here values c0 to c20 are stricly increasing order, hence creating rages. E.g.:

c0 = 0;
c1 = 10;
c2 = 20;
c3 = 30;
c4 = 40;
c5 = 50;
c6 = 60;
c7 = 70;
c8 = 80;
c9 = 90;
c10 = 100;
c11 = 110;
c12 = 120;
c13 = 130;
c14 = 140;
c15 = 150;
c16 = 160;
c17 = 170;
c18 = 180;
c19 = 190;
c20 = 200;

or

c0 = 0;
c1 = 1;
c2 = 2;
c3 = 3;
c4 = 4;
c5 = 5;
c6 = 6;
c7 = 7;
c8 = 8;
c9 = 9;
c10 = 10;
c11 = 11;
c12 = 12;
c13 = 13;
c14 = 14;
c15 = 15;
c16 = 16;
c17 = 17;
c18 = 18;
c19 = 19;
c20 = 20;

Well the first one is a linear-time `O(n)` check (you run through the if statements up to the `n`th result). The second one is actually more akin to an `O(log n)` algorithm, as you're essentially splitting up the range of possible values to check at each if-branch, thus meaning the second would be faster. That all said, an array index or hashmap would still outpace both of these solutions (nearly `O(1)`), and be significantly shorter to write in the process — Rogue, Nov 11 '20 at 17:10
Branch prediction, caching, speculative execution etc. make it basically impossible and unreasonable to predict anything here. — luk2302, Nov 11 '20 at 17:11
I think your example is broken: `if(i>0)` being false makes all the rest of the `i>1..n` conditions false, too. And if it's true, then the `else` conditions don't get checked at all. So your first example is exactly equivalent to `if(i>0) ...;` with no else conditions, because they're all subsets of the first condition (can only be true if it was true). — Peter Cordes, Nov 12 '20 at 01:27
@PeterCordes thanks for pointing out the mistake. I corrected it. — Suminda Sirinath S. Dharmasena, Nov 12 '20 at 03:42

score 1 · Answer 1 · answered Nov 11 '20 at 17:19

First, let's analyse your first code snippet

if (i > 0) {
// ...
} else if (i > 1) {
// ...
} else if (i > 2) {
//...

None of these else if conditions makes sense as,

if i is not greater than 0, it is either equal to or less than 0 and therefore your else if conditions should have these values (i.e. equal to or less than 0).

Now, let's analyse your second code snippet:

if (i > 10) {
    if (i > 15) {
        if (i > 18) {
           //...

As you can see if i is greater than 10, the second if condition will definitely be evaluated and if the second one is true, the third one will definitely be evaluated.

Thus, you are comparing apples and oranges.

If the branching is the same, there should not be any difference in performance.

There was a mistake in the 1st example which is now corrected. — Suminda Sirinath S. Dharmasena, Nov 12 '20 at 03:45

score 0 · Answer 2 · answered Nov 11 '20 at 17:30

There are two different factors that play a major role in performance.

Algorithmic

Forget about the vagaries of how CPUs work for a moment. Imagine this code:

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        for (int k = 0; k < n; k++) {
            doSomething();
        }
    }
}

Performance of doSomething can and probably will vary wildly even if it is a very simple job (hey, hotspot compilation is a thing!), but even so, if n is 10, this runs doSomething a thousand times. If n is 11, then doSomething is run 1331 times. If we make a chart with n on the x axis and on the y axis 'how often is doSomething invoked', it'll look like y = x^3.

And that is a chart that grows very fast. Even if doSomething is fast, or has highly irregular performance, for some sufficiently large value of n, performance is going to be horrible simply because of the sheer amount of looping we do here.

The key words are for some sufficiently large value. It means that eventually the core performance characteristic of this code (O(n^3)) will be the only relevant factor.

But a CPU can loop 1331 times in an instant, and the fact that it's 3 loops deep is going to pale in comparison to what doSomething does if n is small. When does 'n' get so large, the sheer amount of loops you do ensures this is going to be slow no matter what happens in doSomething'? That depends on your CPU. But it will eventually happen.

Let me put it differently: If we make a different chart, which charts x-axis = n, and y-axis = milliseconds taken by dedicated CPU to chug through the calculation, then the chat will look like a mess initially, as a random interruption by your winamp or whatnot causes weird peaks, and a cache miss causes another peak somewhere else, but as the chart goes to the right, it starts turning into y= x^3 if only you go far enough to the right.

CPUs also aren't magical voodoo machines that can invent more efficient algorithms. If you write an algorithm that has the O(n^3) characteristic, the compiler, hotspot, and CPU pipeliner are highly unlikely to just make that 'long tail of performance horror' go away except in trivial cases (where doSomething does literal nothing and hotspot knows it, so optimizes by removing the entire call and thereby the entire loop structure).

Local

But what happens when we don't look at the tail, but look at the start? Then it's anyone's guess. CPUs have pipeliners which make it far harder to predict. The days where you can say: "This instruction takes 8 CPU cycles, it's a 4GHz CPU, so that should cost 2 billionth of a second to run it, on the dot" are long gone - CPUs haven't worked that way for decades by now: They are parsing one opcode whilst running another, and there are multiple cores involved.

Furthermore, these days, for the CPU to access main memory it takes hundreds of cycles, and in practice they can't even do it anymore: They can only operate on cached pages, and if you try to read or write to memory, the instruction is translated to the right entry in the cache. If the page isn't cached, the CPU will freeze whilst the infrastructure around the CPU finds a page to clear out, and loads in the entire page from main memory into the CPU cache (which have a hierarchy of layers to complicate matters), and the CPU is waiting for ages.

Given that operations done in completely different code and maybe even different threads can cause your cache pages to get flushed, it is obviously completely impossible to try to guess at how many nanoseconds something takes.

It also means if you take more memory than is needed, your code might not run any slower, but because it causes another page to get evicted from cache, code elsewhere now does. Is it even fair to 'blame' this code for a performance hit if trying to time this code never seems to result in slower execution?

Complicated. Very complicated.

You can use JMH to benchmark code. It tries to smear out the random noise by running the code very very often, and long enough that hotspot has kicked in (hotspot will notice a method is run a lot, and will analyse it and rewrite it to highly optimized machine code. Because usually VMs spend 99% of the time in less than 1% of the loaded methods, and that analysis and rewrite is a very pricey process, java doesn't do it for all code, only for code that is run a lot, so you have to wait for that to kick in for a true measurement).

Your code

Given that we're talking, in the example, about 20 if statements, you're never going to notice. We're waaaaaayy on the 'left' side of that performance curve, where the algorithmic performance is irrelevant and it's all about cache misses, the vagaries of the CPU pipeline, etc. Generally, you can consider any code that is less than 1000 instructions and which causes no cache misses to be instantaneous. CPUs are very fast, the bottleneck is more that they have to wait for other equipment in your hardware to catch up, and for those dreaded cache misses.

Nevertheless, as a principle? The first example needs to perform n checks, whereas the second only performs log(n) checks. log(n) is generally known as 'as good as instant', but because even if n is a million, log(n) would only be 20.

In other words, none of this stuff matters unless you have literally thousands of if statements, and then it might, maybe, possibly be noticable, but only if you run that method a heck of a lot.

More generally, you don't even need log(n) performance, you can go with log(1) by jumping straight to the right location. THis is exactly what switch statements do.

Thus, to conclude:

No, it doesn't matter. You'll never know, and probably run into the hard 65536 max bytecode per method limit than that you're going to see a performance difference.
If you want the theoretical fastest way to do such a decision tree, use a switch statement.
Because CPUs and hotspot is so complicated, it is completely impossible to 'make your program run faster' by guessing at such small fry code changes. That's just not how it works: Write clean, flexible code (make it as readable as you can). Then once your app is done, run a profiler which will tell you which 1% of your codebase takes 99% of the resources and how it does so, and if you feel your app is too slow, use this to make your code better. If you wrote clean code, it'll be easy. If you wrote ugly code in a vain attempt to make it faster, it'll be hard, and unless you have infinite development hours, you end up with a slower mess. So don't ever prematurely optimize.
That's all out the window if you know the algorithmic complexity is bad (O(x^2) or worse, generally), and you expect that you'll end up with sufficiently large inputs that it will matter. Then, yeah, prematurely optimize and invent better algorithms. If you are trying to decision-tree your way through a million different options, definitely use a hashmap, and don't loop through a million-large list.

*They are parsing one opcode whilst running another* - Each core is decoding 4 or 5 opcodes per cycle. Not only are they pipelined, but also superscalar. :P http://www.lighterra.com/papers/modernmicroprocessors/. But yes, the key point is that you can't add up a single "cost" number for separate instructions to find overall performance. [The 3 main bottlenecks](https://stackoverflow.com/q/51607391/224132) are front-end, back-end latency, and back-end throughput of a specific port, for code that's not memory-bound or stalling on branch misspredicts. — Peter Cordes, Nov 12 '20 at 03:04
*In other words, none of this stuff matters* - I'd hardly say that. This linear sequence vs. log n branch code could be inside the inner-most loop, e.g. in an interpreter that was running byte-code. Although in that case (dense cases), normally you'd want O(1) dispatch via a `switch` (which might encourage a compiler to make a jump table) or manually using a table of function pointers. (Or whatever the Java equivalent is, I forget.) — Peter Cordes, Nov 12 '20 at 03:07
... it could be, and then it still wouldn't matter. Unless it does. You can make a claim of 'hey, it _could_ be the crucial path!' about any line of code, and you'd be right. But if that's the reason you're carefully reviewing and tweaking every line you write for 'performance', you're doing it very very wrong, which is why I'm very hesitant to put that out there without a ton of caveats. If that level of performance squeeze is needed, then OP should run a profiler, figure out which methods to focus on. If this one comes up, by all means. Then it matters. — rzwitserloot, Nov 12 '20 at 13:08
The OP asked about performance of 2 different strategies. It's pointless to include a long diatribe about performance not usually mattering in every performance answer. (A quick caveat is fine). It can be interesting to know about the tradeoffs; sometimes it's neutral for readability and sometimes it matters. Of course, branch prediction is tricky, and tradeoffs between linear with a few common cases first vs. binary search may well depend on surrounding code, not just context. So it's hard to say much useful here beyond the obviously bad average / worst case performance of a linear chain. — Peter Cordes, Nov 12 '20 at 19:39

JineshEP · Answer 3 · 2020-11-11T22:24:59.877

For the cascading if-else, all the expressions gets evaluated from top to bottom in the order as they appear, up to the if-else condition that evaluates to true.

And the performance depends on number of expressions to evaluate before a true for a condition is obtained.

Taking the example here in your case:

if (i > 0) {
// ...
} else if (i > 1) { 
// ...
} else if (i > 2) {
// ...
} else if (i > 3) {
// ...
} else if (i > 4) {

For any integer input i, if its value is less than or equal to 0, the whole block has to execute, whereas i greater than 0 will exit after evaluation of the first if itself.

For the next example, it is equivalent and more readable if written using short circuit operator.

if (i > 10) {
    if (i > 15) {
        if (i > 18) {
            if (i > 19) {
                if (i > 20) {

Which is better written using && as :

 if (i>10 && i>15 && i>18 && i>19 && i>20){ // note i >10 && i< upperbound instead
 }

Collapsible if statements should be merged

Here i > 10, will result in true and will result in evaluations of the rest of && conditions to the right.

So this would be better performing for inputs that results in the first expression itself evaluating to false, that is when input i is an integer satisfying the condition i <= 10.

Also like other answers suggest try benchmarking the code, using jmh for instance, which will give better insights on how the code perform after optimizations.

Yes the OP's example is broken (only the `i>0` check is needed, the others are skipped if it's true, and also false if it's false). But those ifs *aren't* collapsible into `&&`: each one has a separate `else` block. You'd have to repeat the conditions (in negative form) if you did the outer ifs that way, and hope the compiler was able to [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination) and factor back into nested branches. — Peter Cordes, Nov 12 '20 at 03:10
@SumindaSirinathS.Dharmasena the performance depends on number of if-else expression evaluations to make before a true result satisfy an if block in sequence. So for this reason, if most of i in the input satsifies the condition i > c20 (the first if in cascading-if-else), the cascading-if-else will perform better. if inside if needs more evaluations to make in comparison. — JineshEP, Nov 12 '20 at 18:15

Is there a performance difference between cascading if-else statements and nested if statements

3 Answers3

Algorithmic

Local

Your code