Is decremental for loop more efficient than incremental?

Question

Possible Duplicate:
Is it faster to count down than it is to count up?

For example,

for (int i = 0; i < max; i++)
{
    ...
}

and

for (int i = max-1; i >= 0; i--)
{
    ...
}

These two loops are essentially the same, and assuming that the loop doesn't contain any array operation. However, for the first case, every iteration will require loading max into a register in the processor and then compare between i and max.. on the other hand, the latter case doesn't require loading 0 into a register for that 0 is already there in a register, so that there is only a comparison for the latter loop. Please correct me if I am wrong, and elaborate if I am right. Thanks.

I'd be highly surprised if there was any difference at all. Please don't go spending your time looking for little things like this to speed up your application by a couple milliseconds, though. — chris, Jul 18 '12 at 20:56
`for (int i = max; i--;)` would be more idiomatic than `for (int i = max-1; i >= 0; i--)`, since it will work with unsigned index types. — ildjarn, Jul 18 '12 at 20:58
Yes, it might be a nanosecond faster. Is that important? Modern CPUs perform billions of instructions per second, and we might possibly save one. — Bo Persson, Jul 18 '12 at 20:58
Benchmark it and tell us. Oh, and there's a decent chance the compiler can manage not to spill `max`, if it has enough registers and can be sure it never changes. — Useless, Jul 18 '12 at 20:59
They're each one assembly instruction on x86 so there's no difference. — Seth Carnegie, Jul 18 '12 at 21:01
@chris: I would be surprised if this change would yield a couple of *microseconds* unless this is the hottest loop and the application runs for a very very long time — David Rodríguez - dribeas, Jul 18 '12 at 21:13
As shown in [an answer yesterday](http://stackoverflow.com/a/11530028/179910), a modern compiler can convert what you write as a loop counting up, into a loop that counts down anyway. — Jerry Coffin, Jul 18 '12 at 21:15
@SethCarnegie: That's not actually true. `jns` can be used to test against `0`, so no `cmp` instruction is required (unless I'm missing something more subtle, in which case please let me know). — Ed S., Jul 18 '12 at 21:23
Do you do any memory access in the loop? If you do the cost of memory access wil overwhelm the cost register of comparison. Will [hardware prefetch](http://www.futurechips.org/chip-design-for-all/prefetching.html) kick in *both* directions on the platform you're targeting? — Remus Rusanu, Jul 18 '12 at 21:27
@EdS.: The code I posted that converted it to downward counting was compiled with VS2010. It will depend on what you have in the loop though. — Jerry Coffin, Jul 18 '12 at 21:30
You could worry about such stuff, or just spec up your minimum hardware requirements. Editing 'minimum 2GHz processor' to 'minimum 2.4GHz processor', 'minumum 1GB RAM' to 'minimum 2GB RAM' and, '20 MB hard disk' to '20M SSD space' is just so much easier than fiddling with loop directions and vastly more influential in 99.99% of all household apps. — Martin James, Jul 18 '12 at 21:33
@MartinJames: That kind of attitude is why most of the applications on my (beast of a) PC run slow as a dog (though in this case I can't imagine that a hardware upgrade would be warranted because every loop were written counting upward). — Ed S., Jul 19 '12 at 17:45

Ed S. · Accepted Answer · 2012-07-18T21:28:33.140

The code represented by the ellipsis will almost certainly relegate any actual performance difference to mere noise. However, you're not correct in all of your assumptions.

every iteration will require loading max into a register in the processor and then compare between i and max

Maybe, but probably not. This depends on your code, but any sane optimizing compiler will be able to detect if the counter is changing between iterations.

I'm not sure where you got some of your ideas, but they are a bit misguided and don't take into account how an optimizing compiler works. Look at your disassembly and see what the real difference is yourself. Oh what the hell, I'll do it (it's fun anyway):

The program is:

int main(int argc, char *argv[]){   
    int max = 10;
    for (int i = max-1; i >= 0; i--)
    {
        cout << i;
    }
    return 0;
}

The generated assembly (VS2010 release, comments my own) is:

int main(int argc, char *argv[]){   
00341000  push        esi  
    int max = 10;
    for (int i = max-1; i >= 0; i--)
00341001  mov         esi,9               ; move a static 9 into esi
00341006  jmp         main+10h (341010h)  
00341008  lea         esp,[esp]           ; load the address of whatever
0034100F  nop                             ; esp points to in memory 
    {                                     ; (not a memory fetch, just address calculation)
        cout << i;
00341010  mov         ecx,dword ptr [__imp_std::cout (342048h)]  
00341016  push        esi  
00341017  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (342044h)]  
0034101D  dec         esi                 ; decrement counter
0034101E  jns         main+10h (341010h)  ; jump if not signed
    }

And for the more idiomatic version...

int main(int argc, char *argv[]){   
00AC1000  push        esi  
    int max = 10;
    for (int i = 0; i < max; i++)
00AC1001  xor         esi,esi  
    {
        cout << i;
00AC1003  mov         ecx,dword ptr [__imp_std::cout (0AC2048h)]  
00AC1009  push        esi  
00AC100A  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0AC2044h)]  
00AC1010  inc         esi               ; increment esi
00AC1011  cmp         esi,0Ah           ; compare to 10 (0Ah)
00AC1014  jl          main+3 (0AC1003h) ; if less, jump to top 
    }

So yes, the first version uses a jns instruction (jump if not signed), so the comparison is simplified a bit (comparing to 0). It also contains a few more instructions, but no comparison.

However, notice that the comparison made in version two is also static. It knows that max doesn't change throughout the loop, so it can optimize that bit accordingly.

But I would reiterate strongly that this is not likely to ever produce an appreciable performance benefit. Even the high performance timer on my Windows PC couldn't give me a good statistical difference between the two because the call to cout takes soooo much longer than the loop instructions.

score 2 · Answer 2 · answered Jul 18 '12 at 21:00

Compiler-level optimization is going to vary enough depending on the contents of the loop that this is probably a moot point. For example, the compiler may be able to figure out that the value of Max won't change during the loop, and thus only load it into memory once.

If you're really worried about this level of micro-optimization, you need to know exactly what compiler and compiler settings you're planning to use, and then do timing tests on the target hardware to compare different options. Alternatively, you can look at the compiler output directly and compare the actual assembly or machine-level instructions to see if one version uses more instructions than another.

AJG85 · Answer 3 · 2012-07-18T21:11:20.843

1

The choice to increment or decrement is generally not based on anything performance related. It's usually based on the logical flow for whatever algorithm or bit of code that makes the most sense for the context of the loop.

Depending on implementation pre-increment (++i) can sometimes be faster than post-increment (i++) but the compiler will optimize most loops or even completely unroll them if the amount of iterations is constant. It's not usually worth optimizing any code until you've identified a performance bottle neck in your testing.

In short, don't sweat the small stuff.

edited Jul 18 '12 at 21:11

answered Jul 18 '12 at 21:03

AJG85

15,849
13
42
50

2

I don't think the question is about pre-increment vs. post-increment, it's about counting through the loop forwards vs. backwards. – ildjarn Jul 18 '12 at 21:05
1

True, it doesn't add to the answer. I just added it due to OP's interest in efficiency. – AJG85 Jul 18 '12 at 21:07

score 0 · Answer 4 · answered Jul 18 '12 at 21:04

Yes, it may be better, because comparison to 0 is better then comparison to non-zero. But the modern compilers are usually doing a good job in the optimizing code, so there will not be big difference.

The last point - it is a micro optimization. I'd avoid it, unless it makes the code more readable.

Is decremental for loop more efficient than incremental?

4 Answers4