Cyclic dependency of global variables with extern specifier

Question

Global variable can be declared without being defined by using extern storage class specifier. So I believe circular dependency can be introduced for global variables, just like how classes/modules can be made mutually dependent using forward declaration. How does a linker handles such dependencies among variable definitions? Does such practice produce an undefined behavior?

//source2.cpp

extern int b;
int a = b + 1;

//source1.cpp

#include<iostream>

extern int a;
int b = a + 1;

int main() {
    std::cout << a << " " << b <<std::endl;
}

or even,

#include<iostream>

extern int a;
int b = a + 1;
int a = b + 1;

int main() {
    std::cout << a << " " << b <<std::endl;
}

both prints out 2 1. What is happening? I guess linker solved external symbol int a to have value of 0. But how did it even decide external symbol-solving is finished, instead of being stuck forever in recursive search for variables' definitions?

Possible duplicate of [Static variables initialisation order](https://stackoverflow.com/questions/211237/static-variables-initialisation-order) Also: https://isocpp.org/wiki/faq/ctors#static-init-order — Richard Critten, Sep 06 '19 at 17:03
Note: Since C++17 you can have [inline variables](https://en.cppreference.com/w/cpp/language/inline). — Jesper Juhl, Sep 06 '19 at 17:35

score 2 · Accepted Answer · answered Sep 06 '19 at 17:33

This is what the standard has to say:

Variables with static storage duration are initialized as a consequence of program initiation. Variables with thread storage duration are initialized as a consequence of thread execution. Within each of these phases of initiation, initialization occurs as follows.

[...] Constant initialization is performed if a variable or temporary object with static or thread storage duration is initialized by a constant initializer for the entity. If constant initialization is not performed, a variable with static storage duration (6.7.1) or thread storage duration (6.7.2) is zero-initialized (11.6). Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. All static initialization strongly happens before (4.7.1) any dynamic initialization. [ Note: The dynamic initialization of non-local variables is described in 6.6.3; that of local static variables is described in 9.7. —end note ]

An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that

the dynamic version of the initialization does not change the value of any other object of static or thread storage duration prior to its initialization, and

the static version of the initialization produces the same value in the initialized variable as would be produced by the dynamic initialization if all variables not required to be initialized statically were initialized dynamically.

[ Note: As a consequence, if the initialization of an object obj1 refers to an object obj2 of namespace scope potentially requiring dynamic initialization and defined later in the same translation unit, it is unspecified whether the value of obj2 used will be the value of the fully initialized obj2 (because obj2 was statically initialized) or will be the value of obj2 merely zero-initialized. For example,
inline double fd() { return 1.0; }
extern double d1;
double d2 = d1;    // unspecified:
                   // may be statically initialized to 0.0 or
                   // dynamically initialized to 0.0 if d1 is
                   // dynamically initialized, or 1.0 otherwise
double d1 = fd();  // may be initialized statically or dynamically to 1.0
—end note ]

[...]

If [some conditions] V is defined before W within a single translation unit, the [dynamic] initialization of V is sequenced before the initialization of W.

Conceptually, static initialization is performed at translation time: the compiler emits a symbol whose value is the already-initialized value. In some cases this will be 0; in some cases, it will be the result of evaluating a constant expression initializer and/or calling a constexpr constructor for the variable. If any dynamic initialization needs to be done---because the actual initialization of the variable does not satisfy the conditions for constant initialization---then the compiler emits a piece of code that initializes the variables in that translation unit in definition order. The linker takes all these pieces of code that perform dynamic initialization and combines them in some order (possibly interleaved).

There is no infinite recursion, because the dynamic initialization of a does not kick off the dynamic initialization of b; it simply uses whatever value b already has, either because b was already dynamically initialized, or because it still has its value from static initialization. And vice versa. If b is dynamically initialized before a---and you have no guarantee of this since the two variables are defined in different translation units---then at the time of b's dynamic initialization, a has the value 0, so b becomes 1; then when a is dynamically initialized, its value becomes 2, so you see the result 2 1. But if a is dynamically initialized before b, you see 1 2.

In the case where there is only one translation unit, b's dynamic initialization must occur before a's because dynamic initializations within a single translation unit occur in definition order (not declaration). That explains the result 2 1 that you are seeing. However, this result of 2 1 is still not guaranteed because of the provision allowing dynamic initialization to be done statically. The compiler may choose to statically give a the value of 2 because that is the value that it would have if it were dynamically initialized. If the compiler made the choice to make a's initialization completely static but did not so choose for b, then the dynamic initialization of b would give it the value 3.

What about the case with two different translation units? Here the standard's wording is not clear but my interpretation is that it is allowed to fully statically initialize either or both a or b to any valid value that it could have based on any valid order of dynamic initialization! If only a is fully statically initialized, it could be statically initialized to either 1 or 2, causing b to become 2 or 3, respectively during dynamic initialization. Likewise if only b is fully statically initialized, it could be statically initialized to either 1 or 2, causing a to become 2 or 3, respectively. So:

For the first program, the possible results are 1 2, 2 1, 2 3, or 3 2.
For the second program, the possible results are 2 1 and 2 3.

I think that in practice, a compiler that gave either variable the value of 3 would make some users very angry and would probably stop doing this. Still, the theoretical possibility exists.

A way to avoid the issue of unpredictable initialization order is to forbid non-constant initializers for non-local static variables. In that case, there is no possibility of dynamic initialization occurring, so all initialization of non-local static variables happens in a well-defined order and results in a well-defined value, and in fact will most likely be evaluated at compile time.

score 1 · Answer 2 · answered Sep 06 '19 at 17:36

I think you are picturing as one step what is actually multiple steps. Let's take a look at what happens, starting with compilation. I'll focus on the definition of b; the handling of a is similar.

Compiling
Loosely speaking, when the compiler sees "int b = a + 1;", it does two things. First, it sets aside enough memory to store an int. This memory location is annotated "Note to linker: here is the memory location called "b". Second, the compiler generates annotated instructions similar to the following, which are to be executed when global variables are initialized.
1) Read the value stored in <Note to linker: insert the address of a here>.
2) Add 1.
3) Write the result to b.

Linking
The linker sees the two annotations produced by the compiler. From the first, it is able to compute the address of b, which gets added to the linker's internal list of resolved symbol names. Once this list is complete (across all translation units), the linker handles the second annotation by placing the address of a where it was requested. Finding this address need not be more than a standard binary search of the linker's list. (Recursion is not warranted.)

Execution
When the program runs, it follows the instructions generated by the compiler, as modified by the linker. First memory is set aside for all the global and static variables. Then that memory is initialized. When it comes time for b to be initialized, the computer will read whatever value is in the location for a, add 1, and write the result in the location for b. Whether or not a has been initialized yet is not necessarily determined. (See also static-order-fiasco.)

Cyclic dependency of global variables with extern specifier

2 Answers2