This is the typical question to be DV, so I've hesitated (a lot) before posting it...
I know this question was marked as duplicate, but my tests (if they are good: are they good? this is part of the question) tends to show this is not the case.
At the beginning, I did some tests comparing a for loop to a while loop.
That shows that for loop was better.
But going further, for or while was not the point: the difference is related to:
for (int l = 0; l < loops;l++) {
or
for (int l = 0; l != loops;l++) {
And if you run that (under Windows 10, Visual studio 2017, release), you see that the first one is more than twice faster than the second one.
It is hard (for a novice as I am) to understand if the compiler for some reasons is able to optimize more one or another. But...
- Short question
Why?
- Longer question
The complete code is the following:
For the '<' loop:
int forloop_inf(int loops, int iterations)
{
int n = 0;
int x = n;
for (int l = 0; l < loops;l++) {
for (int i = 0; i < iterations;i++) {
n++;
x += n;
}
}
return x;
}
For the '!=' loop:
int forloop_diff(int loops, int iterations)
{
int n = 0;
int x = n;
for (int l = 0; l != loops;l++) {
for (int i = 0; i != iterations;i++) {
n++;
x += n;
}
}
return x;
}
In both cases, the inner calculation is just here in order to avoid the compiler skipping all the loops.
Respectively this is called by:
printf("for loop inf %f\n", monitor_int(loops, iterations, forloop_inf, &result));
printf("%d\n", result);
and
printf("for loop diff %f\n", monitor_int(loops, iterations, forloop_diff, &result));
printf("%d\n", result);
where loops = 10 * 1000 and iterations = 1000 * 1000.
And where monitor_int is:
double monitor_int(int loops, int iterations, int(*func)(int, int), int *result)
{
clock_t start = clock();
*result = func(loops, iterations);
clock_t stop = clock();
return (double)(stop - start) / CLOCKS_PER_SEC;
}
The result in seconds is:
for loop inf 2.227 seconds
for loop diff 4.558 seconds
So, even if the interest of all that is relative to the weight of what is done inside the loop compared to the loop itself, why such a difference?
Edit:
You can find here the full source code reviewed so that functions are called in a random order several times.
The corresponding disassembly is here (obtained with dumpbin /DISASM CPerf2.exe).
Running it, I now obtain:
- '!=' 0.045231 (average on 493 runs)
- '<' 0.031010 (average on 507 runs)
I do not know how to set O3 in Visual Studio, the compile command line is the following:
/permissive- /Yu"stdafx.h" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"x64\Release\vc141.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /FC /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\CPerf2.pch" /diagnostics:classic
The code for the loops is above, here is the random way to run it:
typedef int(loop_signature)(int, int);
void loops_compare()
{
int loops = 1 * 100;
int iterations = 1000 * 1000;
int result;
loop_signature *functions[2] = {
forloop_diff,
forloop_inf
};
int n_rand = 1000;
int n[2] = { 0, 0 };
double cum[2] = { 0.0, 0.0 };
for (int i = 0; i < n_rand;i++) {
int pick = rand() % 2;
loop_signature *fun = functions[pick];
double time = monitor(loops, iterations, fun, &result);
n[pick]++;
cum[pick] += time;
}
printf("'!=' %f (%d) / '<' %f (%d)\n", cum[0] / (double)n[0], n[0], cum[1] / (double)n[1], n[1]);
}
and the disassembly (the loop functions only, but not sure it is the good extract of the link above):
?forloop_inf@@YAHHH@Z:
0000000140001000: 48 83 EC 08 sub rsp,8
0000000140001004: 45 33 C0 xor r8d,r8d
0000000140001007: 45 33 D2 xor r10d,r10d
000000014000100A: 44 8B DA mov r11d,edx
000000014000100D: 85 C9 test ecx,ecx
000000014000100F: 7E 6F jle 0000000140001080
0000000140001011: 48 89 1C 24 mov qword ptr [rsp],rbx
0000000140001015: 8B D9 mov ebx,ecx
0000000140001017: 66 0F 1F 84 00 00 nop word ptr [rax+rax]
00 00 00
0000000140001020: 45 33 C9 xor r9d,r9d
0000000140001023: 33 D2 xor edx,edx
0000000140001025: 33 C0 xor eax,eax
0000000140001027: 41 83 FB 02 cmp r11d,2
000000014000102B: 7C 29 jl 0000000140001056
000000014000102D: 41 8D 43 FE lea eax,[r11-2]
0000000140001031: D1 E8 shr eax,1
0000000140001033: FF C0 inc eax
0000000140001035: 8B C8 mov ecx,eax
0000000140001037: 03 C0 add eax,eax
0000000140001039: 0F 1F 80 00 00 00 nop dword ptr [rax]
00
0000000140001040: 41 FF C1 inc r9d
0000000140001043: 83 C2 02 add edx,2
0000000140001046: 45 03 C8 add r9d,r8d
0000000140001049: 41 03 D0 add edx,r8d
000000014000104C: 41 83 C0 02 add r8d,2
0000000140001050: 48 83 E9 01 sub rcx,1
0000000140001054: 75 EA jne 0000000140001040
0000000140001056: 41 3B C3 cmp eax,r11d
0000000140001059: 7D 06 jge 0000000140001061
000000014000105B: 41 FF C2 inc r10d
000000014000105E: 45 03 D0 add r10d,r8d
0000000140001061: 42 8D 0C 0A lea ecx,[rdx+r9]
0000000140001065: 44 03 D1 add r10d,ecx
0000000140001068: 41 8D 48 01 lea ecx,[r8+1]
000000014000106C: 41 3B C3 cmp eax,r11d
000000014000106F: 41 0F 4D C8 cmovge ecx,r8d
0000000140001073: 44 8B C1 mov r8d,ecx
0000000140001076: 48 83 EB 01 sub rbx,1
000000014000107A: 75 A4 jne 0000000140001020
000000014000107C: 48 8B 1C 24 mov rbx,qword ptr [rsp]
0000000140001080: 41 8B C2 mov eax,r10d
0000000140001083: 48 83 C4 08 add rsp,8
0000000140001087: C3 ret
0000000140001088: CC CC CC CC CC CC CC CC ÌÌÌÌÌÌÌÌ
?forloop_diff@@YAHHH@Z:
0000000140001090: 45 33 C0 xor r8d,r8d
0000000140001093: 41 8B C0 mov eax,r8d
0000000140001096: 85 C9 test ecx,ecx
0000000140001098: 74 28 je 00000001400010C2
000000014000109A: 44 8B C9 mov r9d,ecx
000000014000109D: 0F 1F 00 nop dword ptr [rax]
00000001400010A0: 85 D2 test edx,edx
00000001400010A2: 74 18 je 00000001400010BC
00000001400010A4: 8B CA mov ecx,edx
00000001400010A6: 66 66 0F 1F 84 00 nop word ptr [rax+rax]
00 00 00 00
00000001400010B0: 41 FF C0 inc r8d
00000001400010B3: 41 03 C0 add eax,r8d
00000001400010B6: 48 83 E9 01 sub rcx,1
00000001400010BA: 75 F4 jne 00000001400010B0
00000001400010BC: 49 83 E9 01 sub r9,1
00000001400010C0: 75 DE jne 00000001400010A0
00000001400010C2: C3 ret
00000001400010C3: CC CC CC CC CC CC CC CC CC CC CC CC CC ÌÌÌÌÌÌÌÌÌÌÌÌÌ
Edit again:
What I feel surprising is also the following:
- In debug more the performance is the same (and the assembly code too)
- So how to be confident about what you're coding if such differences appear after that? (considering I've done no mistake somewhere)