I have two pieces of code which produced the following assembly line instructions from a gdb dump.
# faster on my CPU
# Dump of assembler code for function main():
# This was produced when I declared increment inside the loop
# <snipped> I can put back the removed portions if requested.
0x00000000004007ee <+17>: movq $0x0,-0x8(%rbp)
0x00000000004007f6 <+25>: movl $0x0,-0xc(%rbp)
0x00000000004007fd <+32>: jmp 0x400813 <main()+54>
0x00000000004007ff <+34>: movl $0xa,-0x1c(%rbp)
0x0000000000400806 <+41>: mov -0x1c(%rbp),%eax
0x0000000000400809 <+44>: cltq
0x000000000040080b <+46>: add %rax,-0x8(%rbp)
0x000000000040080f <+50>: addl $0x1,-0xc(%rbp)
0x0000000000400813 <+54>: cmpl $0x773593ff,-0xc(%rbp)
0x000000000040081a <+61>: jle 0x4007ff <main()+34>
# <snipped>
# End of assembler dump.
and then this piece of code.
# slower on my CPU
# Dump of assembler code for function main():
# This was produced when I declared increment outside the loop.
# <snipped>
0x00000000004007ee <+17>: movq $0x0,-0x8(%rbp)
0x00000000004007f6 <+25>: movl $0xa,-0x1c(%rbp)
0x00000000004007fd <+32>: movl $0x0,-0xc(%rbp)
0x0000000000400804 <+39>: jmp 0x400813 <main()+54>
0x0000000000400806 <+41>: mov -0x1c(%rbp),%eax
0x0000000000400809 <+44>: cltq
0x000000000040080b <+46>: add %rax,-0x8(%rbp)
0x000000000040080f <+50>: addl $0x1,-0xc(%rbp)
0x0000000000400813 <+54>: cmpl $0x773593ff,-0xc(%rbp)
0x000000000040081a <+61>: jle 0x400806 <main()+41>
# <snipped>
# End of assembler dump.
As can be seen, the only difference is the position of this line:
0x00000000004007f6 <+25>: movl $0xa,-0x1c(%rbp)
In one version it is inside the loop, in the other version it is outside it. I would expect that the version with less inside of the loop to run faster, yet instead it runs slower.
Why is this?
Extra Info
If relevant, here are the details of my own experiments and the c++ code that produced it.
I tested this across multiple computers running either Red Hat Enterprise Linux Workstation (Version 7.5) or Windows 10. All the computers in question either had a Xeon processor (Linux) or a i7-4510U (Windows 10). I used g++ without any flags to compile, or Visual Studio Community edition 2017. All the results agreed: declaring the variable in the loop resulted in a speedup.
Multiple runs had a runtime of ~5.00s (very little variance) when increment was declared inside the loop on a 64-bit Linux machine.
Multiple runs had a runtime of ~5.40s (Again, very little variance) when increment was declared outside the loop on the same machine.
Declaring the variable inside the loop.
#include <ctime>
#include <iostream>
using namespace std;
int main()
{
clock_t begin, end;
begin = clock();
long int sum = 0;
for(int i = 0; i < 2000000000; i++)
{
int increment = 10;
sum += increment;
}
end = clock();
double elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << elasped << endl;
}
Declaring the variable outside the loop:
#include <ctime>
#include <iostream>
using namespace std;
int main()
{
clock_t begin, end;
begin = clock();
long int sum = 0;
int increment = 10;
for(int i = 0; i < 2000000000; i++)
{
sum += increment;
}
end = clock();
double elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << elasped << endl;
}
I have heavily edited this question because of feedback from the comments. It is much better now, thank you to those who helped refine it! My apologies to those who already put in the effort of answering my unclear question, if the answers and comments seem irrelevant it is because of my inability to communicate.