Initialization of variables

Question

In the following code, are the variables x,y,z initialized during run time when memory is allocated to them or at compile time ?

int main(void)
{
    int x = 5;
    static int y=5;
    int z[] ={1,2,3,4};
    return 0;
}

Depends on where they appear. [Your book may tell you more](https://stackoverflow.com/questions/562303/the-definitive-c-book-guide-and-list). — StoryTeller - Unslander Monica, Dec 17 '17 at 05:42
when are they initialized if they appear inside main function? — Ramesh, Dec 17 '17 at 05:44
Who knows, who cares. The C programming language requires that the behaviour is that the side effects of the program is **as if** it was executed in the abstract machine according to the rules... If you never use these anywhere, there might not be any variables that are being initialized. — Antti Haapala -- Слава Україні, Dec 17 '17 at 05:46
Only one thing is certain. A variable of automatic storage duration that is not elided and is initialized from another variable whose value cannot be determined at compile time, cannot be initialized at compile time. — Antti Haapala -- Слава Україні, Dec 17 '17 at 05:58
Can you please give me some reference regarding this? I am reading K&R book and I didn't find an explanation to this question in that book. — Ramesh, Dec 17 '17 at 06:08
It doesn't really matter. All of the three cases you have described can be used in different scopes (e.g. outside a function, within a function, within a block within a function). The rules for when they are initialised varies with scope but, in each case, they are guaranteed to be initialised before first usage in a code statement, and (possibly) may be initialised earlier than that. You will not be able to DETECT (within bounds of standard C) if initialisation happens at compile time or run time. — Peter, Dec 17 '17 at 06:20
@RajeshR the reason why the book doesn't state this is because it is not relevant. It is just that the program must behave **as if** the variable was initialized when it lifetime starts. What really happens is mostly irrelevant. — Antti Haapala -- Слава Україні, Dec 17 '17 at 06:20
i.e. if you think `int x = 5;` means *allocate memory for a variable of type `int` and initialize it with value `5`*, well, fine, but that is not how the C language works. In C `int x = 5;` means *"suppose that in abstract machine, there were a variable named `x` of type `int` and initialized with value `5`; now the side-effects of the compiled program should be as if it were run in that abstract machine."* — Antti Haapala -- Слава Україні, Dec 17 '17 at 06:26
This is an implementation detail. Neither K&R nor the C standard will not mention how it is to be implemented. If these variables are not used further, they might even be optimized away. Read your compiler / architecture guide for more information. — babon, Dec 17 '17 at 06:55
@AnttiHaapala: I know, and I care, and so do many of my colleagues and other software engineers. Furthermore, our customers around the world rely on the results. The C standard does not require that side effects occur as if the program were executed in the abstract model. It requires only that side effects **it defines as observable** occur as if in the model. But, here in the real world outside the model, the other side effects have serious consequences. Run-time initialiazation of programs was taking too long until we put some engineering effort into fixing it. These are real concerns. — Eric Postpischil, Dec 17 '17 at 11:03
@EricPostpischil well that's true as well, I remember one time when I was using DJGPP. An executable with a static array of 10M zero-initialized structs was 50M when compiled as C++, 500k when the same code was compiled as C... — Antti Haapala -- Слава Україні, Dec 17 '17 at 12:20
@AnttiHaapala: So we should not brush off people‘s questions as not defined by the C standard. It is valuable to understand how compilers work, to understand what language constructs facilitate efficient compilation and execution, to have some insight into how optimization works, to be aware of differences in quality between different compilers, and so on. These questions ought to be answered, not dismissed. There is no real-world principle that things outside the C standard are not useful or knowable. — Eric Postpischil, Dec 17 '17 at 12:31
@K&R might not say much about the abstract machine used to specify the C language. For that, the authoritative document is [the C standard](https://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents). — Eric Postpischil, Dec 17 '17 at 15:55

Yunnosch · Accepted Answer · 2017-12-17T07:46:25.363

4

As the comments have stated, it is not defined whether variables are initialised at runtime, even with the information you gave in comments that they are inside the scope of a function. Only the behaviour is like a virtual machine doing initialisation one time or another.

It is however possible to be sure that they are not initialised at compile time.
Specified by standard or not, there is no way any real machine could do that.
(I will happily edit my answer and remove this statement if anybody comes up with an example of how this would be possible. Software emulators and anything not based on a compiler, like a C-interpreter, do not count.)
At compile time, only the value can be determined with which these variables will be initialised. For the cases in your example that value determination does happen, because the value is not available from anywhere else but the digits in program text.

The initialisation (i.e. the "arrrival" of those values in the memory piece represented by the identifiers) can happen at different times, but never at compile time. Because at compile time no such memory piece has been selected yet. In case of cross-compilation it might not even exist yet (the chip to run it on not having been produced yet).

The following initialisation times are possible
(not saying "usually done" for all of them...):

runtime
necessary for non-static variables in local scope of a function which gets executed for the second time and has changed non-static variables in the first run
load time (in case of dynamic execution contexts, e.g. windows PCs etc.) or
memory setup time (in case of static execution contexts, e.g. embedded execute-in-place environments)
for global variables, static local variables, non-static const local variables (possible but unlikely for stack using concepts), non-static non-const local variables which happen to never get changed (unlikely for stacks, too)
never
for anything which completely gets optimised away for not being used anyway
interpretion time
picking up a valid input from Antti Haapala, C can be interpreted
(this is however still not compile time, at least if you allow me to make a difference there and consider compilers the applicable scope for this question ;-)

For your examples, inside scope of main(), either of those options might apply.
So again, it is not known.

Notes:

I make a difference between load time and memory setup time here, because memory setup time is usually associated with power-on or reset situations, which are somewhat exceptional, while loading a program to execute is perceived as a normal situation.
Both could however be considered the same thing from point of view of the C-program being executed, as being "before main()".

I also make a difference between runtime (as being controlled by the program as defined by C-code you wrote) and load/setup time, which are part of the C runtime environment preparation code. C-runtime might be prepared via OS or (e.g. for embedded environments) via machine code (not based on C) linked from reset vector.

edited Dec 17 '17 at 07:46

answered Dec 17 '17 at 07:25

Yunnosch

26,130
9
42
54

I must disagree with one thing though - C being a single-pass language, it is possible to write a compiling interpreter that would compile into memory... so they could be initialized at compile time too. Though I read the "compile-time initialization" as that they're in the initialized data section. – Antti Haapala -- Слава Україні Dec 17 '17 at 07:33
@AnttiHaapala I agree. Actually I agreed already while writing the answer and have therefor excluded interpreters from my offer to change the "no way" statement. I think it is fair to assume compilers and the load-or-setup concept. – Yunnosch Dec 17 '17 at 07:35
@AnttiHaapala added your valid input nevertheless – Yunnosch Dec 17 '17 at 07:46
1

I would not agree the initialization cannot be done at compile time. The statement is not really meaningful because there is no clear definition for what it means for an object to exist outside of the C abstract model. Consider `int x = 5; x *= y;`. The compiler might implement this as `load r0, #5; load r1, y; mul r0, r0, r1; store x, r0`. Clearly after the `load r0, #5`, `r0` contains the value of `x`. Is it `x`? There is no storage for `x` yet, but we have a thing we are working with as if it were `x`. It is `x` as much as anything can be. And it is 5. What about right before that?… – Eric Postpischil Dec 17 '17 at 12:55
1

The initial value of `x` is hard-coded as the immediate value in the `load r0, #5`. If `x` can exist in a register before it exists in storage, it can also exist in the immediate value of an instruction before it is in a register. And the compiler initialized this at compile time; the 5 has been there since the object module was created. Even with a small array, the compiler can build it into the object file, even as immediates in instructions. – Eric Postpischil Dec 17 '17 at 12:57
You write that objects with automatic storage duration (“non-static variables in local scope of a function”) have to be initialized at run-time because (interpreting your words) the function might initialize `x`, change it, and then run again (or even recursively) and need to initialize `x` again. Thus, the initial value has to be written to `x` to overwrite the changed value (or, if recursive, to put it into a new place). But this presumes there is only one place that `x` can be and that one place can hold only one `x`. But this presumes things that are not clear.… – Eric Postpischil Dec 17 '17 at 13:00
1

… Considering `int x = 5; x *= y;` again, the compiler might implement this as `load r0, #5; store 1000, r0; load r0, 1000; load r1, 2000; mul r2, r0, r1; store r0, 1004`. (Suppose there are some intervening statements or other complications so that this is not as inefficient as it seems.) Here, the compiler has stored `x` in two different places at different times. Is one of these memory locations the real `x` and the other not? So, unlike C’s abstract model, a real implementation does not necessarily have one memory location where an object is stored.… – Eric Postpischil Dec 17 '17 at 13:04
… Next, if I write `double x = 3.47923` one place in my code and `double y = 3.47923` in another place in my code and never take their addresses, the compiler might decide to use the same memory for both of them. Although in the C abstract model, they are different objects, the compiler may implement them using the same storage. Similarly, the compiler might use the same memory for `x` while it has its initial value and other memory for `x` when it is changed.… – Eric Postpischil Dec 17 '17 at 13:05
@Eric, Are the values of variables stored in symbol table during compilation which is used by Target code generation phase of compiler ? – Ramesh Dec 17 '17 at 13:06
… E.g., suppose, when the program starts, the compiler prepares 3.47923, or even a more complicated expression, and stores it in memory location 1000. Then, whenever a function uses `x` while it has its initial value, the code loads it from 1000. But, later in the function, after `x` has changed, the compiler stores the changed value in 1004 and subsequently loads it from 1004. Then, while multiple instance of the function exist simultaneously due to recursion, some of them may have values of `x` in 1000 simultaneously. This `x` has been initialized before the function runs. – Eric Postpischil Dec 17 '17 at 13:06
1

@Eric, you can post your thoughts as a separate answer. – Ramesh Dec 17 '17 at 13:07
@RajeshR: For objects that are stored in memory, symbol tables record the addresses of objects (usually by offset from some reference point, not an absolute memory address), not their values. – Eric Postpischil Dec 17 '17 at 13:07
@EricPostpischil If a compiler generates code which stores a value in a variable, even if that value is determined by the compiler, the actual arrival of that value in the variable (or in any instance of a variable) is when that code is exectued, which will not happen at compile time. You might consider it runtime, load time or setup time. – Yunnosch Dec 17 '17 at 13:08
@RajeshR: I realize this is quite long for comments, but I am not really answering the question, just addressing Yunnosch’s statement about compile-time initialization. – Eric Postpischil Dec 17 '17 at 13:09
Then how does the run time fetch the values during computation? Like c=a*b. How will run time know value of a = 5 if it is not in symbol table? – Ramesh Dec 17 '17 at 13:10
@EricPostpischil Lets meet in a chat. Do you know how to create one? – Yunnosch Dec 17 '17 at 13:10
@Yunnosch: There is no such thing as a “variable” in an executing program. “Variables” are mathematical concepts. The things you work with in C include identifiers (names of objects), objects (a thing that stores some sort of value), and values. A “variable” is, at best, the combination of an identifier, the object it identifies, and the value(s) it holds at various times. But the identifier exists at compile time and is not part of the running program, and the storage used for the object is a concept in the C abstract model that does not necessarily exist in the running program.… – Eric Postpischil Dec 17 '17 at 13:13
… So, because the storage that the C abstract model has for an object does not necessarily exist at execution time, or that may not exist as a single location used for the object, you cannot always say a value for the object has arrived in the object. Or, if you do, you have to be flexible about what is storage for the object. – Eric Postpischil Dec 17 '17 at 13:15
@Yunnosch: Sorry, I would be willing to, but i have to fold laundry now. – Eric Postpischil Dec 17 '17 at 13:15
@EricPostpischil What a pity. Let's hope we find time together. I would like to put the answer on clean definitions of terms and incorporate your input, maybe add a valuable additional angle. I would also be honored if you would make an answer yourself, in the StackOverflow tradition of providing partial insights to a problem, even if they are not cover-all aspects of the question. It seems that your input could benefit from the space and advanced formatting offered by an answer. – Yunnosch Dec 17 '17 at 13:19
1

@RajeshR: The symbol table provides the **address** of `a`, not its **value**. When the program is linked and loaded, the places in the program that refer to the location of `a` will be changed to have the actual address in memory. Where the program uses the value of `a`, it will have a load instruction. Because of the symbol table and linking, that load instruction will have the address of `a`, so it will load the value from that address. – Eric Postpischil Dec 17 '17 at 18:32

score 2 · Answer 2 · answered Dec 17 '17 at 16:21

The question does not have a clear answer. Initialization is something that happens in the abstract machine the C standard uses to describe how a program must behave, and the only requirements in implementing that machine are that the observable behavior described by the standard must be produced. The initial value of an object may be produced at compile time (even if calculating that initial value requires a complex calculation) or at run time (even if the initial value is a simple constant) or not at all (as long as the observable behavior of the program is produced in some way). Furthermore, the initial value might appear in immediate values encoded into instructions, in memory described by the object file, in memory when written during program start-up, in memory when written while executing a function, in a register, or elsewhere. An object might be held in different places at different times, and it might share storage with other objects or with other instances of itself (as when a function executes recursively), even simultaneously.

A more useful question to ask is: What are the costs and effects of various object initializations? Again, this is not defined by the C standard, but it has real-world consequences. Initialization of many objects or large objects can have a noticeable drag on performance and can affect product quality.

The rest of this answer assumes one is using a compiler of good quality and is requesting optimization (as with the -O3 or -Os switch using Clang or GCC).

A declaration such as int x = 5 at block scope is likely to result in 5 being hard-coded into an instruction, if it is needed at all. The initial value might not be needed because the compiler optimizes further work with x so that the initial value is incorporated into the first expression(s) that use x. (In the code shown in the question, x is not needed at all since it is never used, so the compiler will remove it from the program entirely.)

A declaration such as static int y = 5 is likely to result in 5 being written into a data section in the object file (and, when the object file is linked to make an executable file, the data section will be copied or merged into the executable file). Again, this can be subject to optimization.

If int x = 5 were declared at file scope, it is an external definition, and the compiler must make it available to other translation units (compilations of other source files), so it has to be stored in memory (unless the compiler and other developer tools are capable of optimization across translation units). In this case, the compiler is likely to write 5 into a data section of the object file.

With small objects and simple initial values such as these, there is almost never any reason to be concerned with when or how they are initialized, because the costs are so cheap and because no improvements are available.

With int z[] = { 1, 2, 3, 4 }; at block scope, the array is small enough that the compiler might handle it in the same way as the single int x above, depending on circumstances. In the situation where the array is larger, this situation might be handled:

The values 1, 2, 3, and 4 are written to a constant data section in the object file.
When execution effectively reaches the declaration, the compiler allocates storage for z, likely on the stack, and copies the data from the constant data section into z. (It may be hard to say when execution reaches the declaration, since the abstract machine model means the C implementation is free to rearrange many things. Execution of statements before and after the declaration may be rearranged and mixed together.)

However, suppose you have code like this:

int z[] = { 1, 2, 3, 4 };

for (int i = 0; i < 4; ++i)
    z[i] = f(z[i]);

In this case, the compiler can see that no member of z is changed before its initial value is passed to f. When implementing this, the compiler might skip the step above of copying 1, 2, 3, and 4 into the storage allocated for z. Instead, it may implement the loop by reading from the constant data, passing each value to f, and writing the result to the storage allocated for z. In this case, was z ever initialized? The program had the initial values stored in the constant data section, but it never wrote them to the storage particularly allocated for z. It only wrote derived values later on.

However, again, the real question is what are the costs of this? We had to store data in the constant data section, and we had to call f repeatedly. Suppose z had thousands of elements instead of four. Could we reduce those costs?

If f is a pure function (depends only on its arguments, not any global state), then there is no point in storing 1, 2, 3, and 4. We ought to store f(1), f(2), f(3), and f(4) and skip the execution of f at run-time. If you have a good compiler and the source code of f is visible while the compiler is compiling this code, the compiler might do that—it might store the results of calling f rather than just the explicit initial values. (Even if f is not visible, the compiler might do this for known functions, such as cos or log.)

If you are working on a project for which this is important, you should get to know your compiler and how well it does with things like this. There is no uniform answer, particularly as technology continues to develop.

If an initialization like this is important for your application, an alternative is to write a program that computes the initial values (f(1), f(2), and so on) and writes those into a new source file. At compile time, you would execute that program and then compile the resulting source code.

In C, initialization costs are not often a problem except when you have a lot of data that can be preprocessed with an auxiliary program as described above. It is more of a problem in C++ where constructors for classes can be quite extensive.

score 1 · Answer 3 · answered Dec 20 '17 at 08:27

The compiler's job is to do what it's told. You have issued it with the program

int main(void)
{
    int x = 5;
    static int y=5;
    int z[] ={1,2,3,4};
    return 0;
}

The C standard permits a compiler to produce compiled code that replicates exactly the observable behaviour of your program.

To that end, any decent C compiler with optimisations turned up to maximum will produce compiled code that is equivalent to

int main()
{
}

If you want to know how your specific compiler has dealt with your source code, then check the generated output.

Initialization of variables

3 Answers3