The code represented by the ellipsis will almost certainly relegate any actual performance difference to mere noise. However, you're not correct in all of your assumptions.
every iteration will require loading max into a register in the processor and then compare between i and max
Maybe, but probably not. This depends on your code, but any sane optimizing compiler will be able to detect if the counter is changing between iterations.
I'm not sure where you got some of your ideas, but they are a bit misguided and don't take into account how an optimizing compiler works. Look at your disassembly and see what the real difference is yourself. Oh what the hell, I'll do it (it's fun anyway):
The program is:
int main(int argc, char *argv[]){
int max = 10;
for (int i = max-1; i >= 0; i--)
{
cout << i;
}
return 0;
}
The generated assembly (VS2010 release, comments my own) is:
int main(int argc, char *argv[]){
00341000 push esi
int max = 10;
for (int i = max-1; i >= 0; i--)
00341001 mov esi,9 ; move a static 9 into esi
00341006 jmp main+10h (341010h)
00341008 lea esp,[esp] ; load the address of whatever
0034100F nop ; esp points to in memory
{ ; (not a memory fetch, just address calculation)
cout << i;
00341010 mov ecx,dword ptr [__imp_std::cout (342048h)]
00341016 push esi
00341017 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (342044h)]
0034101D dec esi ; decrement counter
0034101E jns main+10h (341010h) ; jump if not signed
}
And for the more idiomatic version...
int main(int argc, char *argv[]){
00AC1000 push esi
int max = 10;
for (int i = 0; i < max; i++)
00AC1001 xor esi,esi
{
cout << i;
00AC1003 mov ecx,dword ptr [__imp_std::cout (0AC2048h)]
00AC1009 push esi
00AC100A call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0AC2044h)]
00AC1010 inc esi ; increment esi
00AC1011 cmp esi,0Ah ; compare to 10 (0Ah)
00AC1014 jl main+3 (0AC1003h) ; if less, jump to top
}
So yes, the first version uses a jns
instruction (jump if not signed), so the comparison is simplified a bit (comparing to 0). It also contains a few more instructions, but no comparison.
However, notice that the comparison made in version two is also static. It knows that max
doesn't change throughout the loop, so it can optimize that bit accordingly.
But I would reiterate strongly that this is not likely to ever produce an appreciable performance benefit. Even the high performance timer on my Windows PC couldn't give me a good statistical difference between the two because the call to cout
takes soooo much longer than the loop instructions.