The answer to your question is: If the compiler can determine that every time you enter the loop, it will execute only once, then it can eliminate the loop.
Normally, the optimization phase unrolls the loops, based on how the iterations relate to one another, this makes your (e.g. indefinite) loop to get several times bigger, in exchange to avoid the back loops (that normally result in a bubble in the pipeline, depending on the cpu type) but not too much to lose cache hits.... so it is a bit complicate... but the earnings are huge. But if your loop executes only once, and always, is normally because the test you wrote is always true (a tautology) or always false (impossible fact) and can be eliminated, this makes the jump back unnecessary, and so, there's no loop anymore.
int blockSize = 1;
for (volatile int i=0; i<blockSize; i++)
{
foo(); // you missed a semicolon here.
}
In your case, the variable is assigned a value, that is never touched anymore, so the first thing the compiler is going to do is to replace all expressions of your variable by the literal you assigned to it. (lacking context I assume blocsize
is a local automatic variable that is not changed anywhere else) Your code changes into:
for (volatile int i=0; i<1; i++)
{
foo();
}
the next is that volatile
is not necessary, as its scope is the block body of the loop, where it is not used, so it can be replaced by a sequence of code like the following:
do {
foo();
} while (0);
hmmm.... this code can be replaced by this code:
foo();
The compiler analyses each data set analising the graph of dependencies between data and variables.... when a variable is not needed anymore, assigning a value to it is not necessary (if it is not used later in the program or goes out of life), so that code is eliminated. If you make your compiler to compile a for loop frrom 1 to 2^64, and then stop. and you optimize the compilation of that,, you will see you loop being trashed up and will get the false idea that your processor is capable of counting from 1 to 2^64 in less than a second.... but that is not true, 2^64 is still very big number to be counted in less than a second. And that is not a one fixed pass loop like yours.... but the data calculations done in the program are of no use, so the compiler eliminates it.
Just test the following program (in this case it is not a test of a just one pass loop, but 2^64-1 executions):
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
uint64_t low = 0UL;
uint64_t high = ~0UL;
uint64_t data = 0; // this data is updated in the loop body.
printf("counting from %lu to %lu\n", low, high);
alarm(10); /* security break after 10 seconds */
for (uint64_t i = low; i < high; i++) {
#if 0
printf("data = $lu\n", data = i ); // either here...
#else
data = i; // or here...
#endif
}
return 0;
}
(You can change the #if 0
to #if 1
to see how the optimizer doesn't eliminate the loop when you need to print the results, but you see that the program is essentially the same, except for the call to printf with the result of the assignment)
Just compile it with/without optimization:
$ cc -O0 pru.c -o pru_noopt
$ cc -O2 pru.c -o pru_optim
and then run it under time:
$ time pru_noopt
counting from 0 to 18446744073709551615
Alarm clock
real 0m10,005s
user 0m9,848s
sys 0m0,000s
while running the optimized version gives:
$ time pru_optim
counting from 0 to 18446744073709551615
real 0m0,002s
user 0m0,002s
sys 0m0,002s
(impossible, neither the best computer can count one after the other, upto that number in less than 2 milliseconds) so the loop must have gone somewhere else. You can check from the assembler code. As the updated value of data
is not used after assignment, the loop body can be eliminated, so the 2^64-1 executions of it can also be eliminated.
Now add the following line after the loop:
printf("data = %lu\n", data);
You will see that then, even with the -O3
option, will get the loop untouched, because the value after all the assignments is used after the loop.
(I preferred not to show the assembler code, and remain in high level, but you can have a look at the assembler code and see the actual generated code)