Performance impact of using 'break' inside 'for-loop'

Question

I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.

I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.

I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:

Independently of the extra code for the conditional statements that trigger the break (like an if), it necessarily ads additional instructions to my loop.

Further, it probably also messes around when my compiler tries to unfold the for-loop, as it no longer knows the number of iterations that will run at compile time, effectively rendering it into a while-loop.

Therefore, I suspect it does have a performance impact, which could be considerable for very fast and tight loops.

So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?

// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
    // do stuff
}


// USING FOR
for(int i=100; i; --i)
{
    if(checkCondition()) {
        // do stuff
    } else {
        break;
    }
}

I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.

EDIT:

Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)

No, the difference is 0. Especially with optimizing compilers. Don't try to micro optimize your code, let your compiler do that for you. — Hatted Rooster, Oct 06 '16 at 13:46
@GillBates But it does loose performance, right? Meaning that its efficiency drops from being a pure `for-loop` to that of a `while-loop`. Please correct me if I'm wrong — andresgongora, Oct 06 '16 at 13:49
@barry I have just fixed the code (probably while you were reading) as I had the `if` statement negated in the `for-loop`. Are they still not equivalent? — andresgongora, Oct 06 '16 at 13:50
@andresgongora _"But it does loose performance, right?"_ What makes you think so actually? — πάντα ῥεῖ, Oct 06 '16 at 13:50
@ Barry could you please point me where I got them wrong? I'll fix it real quick :) — andresgongora, Oct 06 '16 at 13:54
@πάνταῥεῖ In the sense that, altought my CPU has an awesome prediction circuitry, the loop can not be unrolled. I am not sure about this, as I am only following my intuition — andresgongora, Oct 06 '16 at 13:56
About unrollment: I don't see any reason why the compiler can't unroll loop with break, it will simply include the test several times. About executing break itself penalty: almost zero. About loops with break performance: you can also look from other point of view. Without the break you would process the whole loop, while with break you may sometimes exit really early, so in such case the break is actually improving the performance a lot. If you can write branch-less variant of algorithm in elegant way, certainly do it, but in many cases the lean elegant algorithm with branch is still best. — Ped7g, Oct 06 '16 at 13:58
@andresgongora there have been studies about how "intuition" is usually pretty wrong concerning optimization (when you do not have a lot of practice in that field) best is always to measure. You also can check the resulting assembler code to learn what happens. But don't trust you intuitions on that one. Sometimes you can make things even worse. — Hayt, Oct 06 '16 at 13:58
@andresgongora What's the first value of `i` in `// do stuff` in both cases? — Barry, Oct 06 '16 at 14:01
@Hayt Indeed. Intuition is a bad advisor in this matter. Therefore I measured the execution time of my code, and found no difference. Because I can't be sure if it is due to my compiler's optimization, I wanted to ask the community :) — andresgongora, Oct 06 '16 at 14:05
@barry Thank you :) I think I saw the problem. One got executed one time less than the other. — andresgongora, Oct 06 '16 at 14:07
@andresgongora turn optimization off and look at it then (also for serious measurements you need quite a few runs and iterate this more time. when the code "around" this or just the loading takes up 99% of the executed time your measurements can be faulty. — Hayt, Oct 06 '16 at 14:08

score 7 · Answer 1 · answered Oct 06 '16 at 13:49

7

The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.

The real answer is that CPU have powerful branch prediction circuits which empirically work really well.

What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.

Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.

If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.

answered Oct 06 '16 at 13:49

Jack

131,802
30
241
343

So, does the break inside the loop indeed have a performance impact? – andresgongora Oct 06 '16 at 13:58
1

You can read a little bit this very voted answer: http://stackoverflow.com/a/11227902/5520058 By reading this, you will understand better @Jack's answer. – mtb Oct 06 '16 at 14:04
@mtb thank you. Maybe I did not understand Jack's answer. – andresgongora Oct 06 '16 at 14:14
No, I'm seriously. It is the most voted answer to the top voted question on SO. It's about the branch prediction capability. – mtb Oct 06 '16 at 14:17

score 3 · Answer 2 · answered Oct 06 '16 at 13:53

The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.

In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:

unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
    stop &= checkcondition();  // returns 0 or all-bits-set;
    sum += (stop & x[i]);
}

might be faster than:

unsigned sum = 0;
for (int i = 0; i<32; i++)
{
    if (!checkcondition())
        break;
    sum += x[i];
}

for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.

... but the only way to tell would be to measure.

Performance impact of using 'break' inside 'for-loop'

2 Answers2