This behavior seems to be correct if you dig a bit into the standard.
First hint is in the note on at section 3.3.1/4, which says:
Local extern declarations (3.5) may introduce a name into the declarative region where the declaration appears and also introduce a (possibly not visible) name into an enclosing namespace;
Which is a little bit vague and seems to imply that compiler is not required to introduce the name upper_bound
in the global context when passing through the bar()
function, and therefore, when upper_bound
appears in the foo()
function, there is no connection made between those two extern variables, and therefore, bar()
has no side-effect as far as the compiler knows, and thus, the optimization turns into an infinite loop (unless upper_bound is zero to begin with).
But this vague language is not enough, and it is only a cautionary note, not a formal requirement.
Fortunately, there is a precision later on, at section 3.5/7, which goes as follows:
When a block scope declaration of an entity with linkage is not found to refer to some other declaration, then that entity is a member of the innermost enclosing namespace. However such a declaration does not introduce the member name in its namespace scope.
And they even provide an example:
namespace X {
void p() {
q(); // error: q not yet declared
extern void q(); // q is a member of namespace X
}
void middle() {
q(); // error: q not yet declared
}
}
which is directly applicable to the example you gave.
So, the core of the issue is that the compiler is required not to make the association between the first upper_bound
declaration (in bar) and the second one (in foo).
So, let's examine the implication for optimization of the two upper_bound
declarations are assumed to be un-connected. The compiler understands the code like this:
void bar()
{
extern int upper_bound_1;
upper_bound_1--;
}
void foo()
{
extern int upper_bound_2;
for (int i = 0; i < upper_bound_2; ) {
bar();
}
}
Which becomes as follows, due to function inlining of bar:
void foo()
{
extern int upper_bound_1;
extern int upper_bound_2;
while( 0 < upper_bound_2 ) {
upper_bound_1--;
}
}
Which is clearly an infinite loop (as far the compiler knows), and even if upper_bound
was declared volatile
, it would just have an undefined termination point (whenever upper_bound
happens to externally be set to 0 or less). And decrementing a variable (upper_bound_1
) an infinite (or indefinite) amount of times has undefined behavior, because of overflow. Therefore, the compiler can choose to do nothing, which is an allowed behavior when it's undefined behavior, obviously. And so, the code becomes:
void foo()
{
extern int upper_bound_2;
while( 0 < upper_bound_2 ) { };
}
Which is exactly what you see in the assembly listing for the function that GCC 4.8.2 produces (with -O3
):
.globl _Z3foov
.type _Z3foov, @function
_Z3foov:
.LFB1:
.cfi_startproc
movl upper_bound(%rip), %eax
testl %eax, %eax
jle .L6
.L5:
jmp .L5
.p2align 4,,10
.p2align 3
.L6:
rep ret
.cfi_endproc
.LFE1:
.size _Z3foov, .-_Z3foov
Which can be fixed by adding a global-scope declaration of the extern variable, as such:
extern int upper_bound;
void bar()
{
extern int upper_bound;
upper_bound--;
}
void foo()
{
extern int upper_bound;
for (int i = 0; i < upper_bound; ) {
bar();
}
}
Which produces this assembly:
_Z3foov:
.LFB1:
.cfi_startproc
movl upper_bound(%rip), %eax
testl %eax, %eax
jle .L2
movl $0, upper_bound(%rip)
.L2:
rep ret
.cfi_endproc
.LFE1:
.size _Z3foov, .-_Z3foov
Which is the intended behavior, i.e., the observable behavior of foo()
is equivalent to:
void foo()
{
extern int upper_bound;
upper_bound = 0;
}