3

Assuming a purely non-optimizing compiler, is there any difference in machine code between initializing a variable and assigning it a value after declaration?

Initialization method:

int x = 2;

Assignment method:

int x;
x = 2;

I used GCC to output the assembly generated for these two different methods and both resulted in a single machine instruction:

movl    $2, 12(%esp)

This instruction just sets the memory held by the x variable to the value of 2. GCC may be optimizing this because it can recognize the end result of the operations; but I think this is the only way to interpret the two versions. My reasoning is that both version do the same thing: set a part of memory to a specific value.

Why is it then that a distinction is often made between the terms "initialization" and "assignment" if the resulting machine code is the same?

Is the term "initialization" used purely to differentiate variables which have a specific value assigned over those (non-initialized) variables which have whatever garbage value was left in memory?

Vilhelm Gray
  • 11,516
  • 10
  • 61
  • 114

4 Answers4

7

Assuming a purely non-optimizing compiler, is there any difference in machine code between initializing a variable and assigning it a value after declaration?

Sure.

  • char fubar[] = "hello world"; is valid.
  • char fubar[]; fubar = "hello world"; is not.

More?

  • int fubar[128] = { [60] = 42 }; is valid.
  • int fubar[128]; fubar = { [60] = 42 }; is not.

More?

  • struct foo bar = { .foo = 13, .bar = 42 }; is valid.
  • struct foo bar; bar = { .foo = 13, .bar = 42 }; is not.

More?

  • const int fubar = 0; is valid.
  • const int fubar; fubar = 0; is not.

I could go on and on... Hence, machine code might exist for one while it most likely won't for the other. On that note, have you ever heard of an implementation of C that isn't a compiler?

Why is it then that a distinction is often made between initialization and assignment if the resulting machine code is the same?

The concept of variables in the C programming language is too high-level for the low-level machine code representation. In machine code, registers don't have scope. C added scope, not to mention type fusion and many of the other variable-related aspects, along with initialisation (which you can see from previous examples is squarely, but unfortunately not the same).

Is the term "initialization" used purely to differentiate variables which have a specific value assigned over those (non-initialized) variables which have whatever garbage value was left in memory?

Though a variable that is "initialized" won't contain any "garbage value" (or trap representations), this is not the only affect it has.

In my first example, the initialization will provide the size of the otherwise incomplete array. The equivalent using the assignment operator would require explicitly providing the length of the array and using strcpy, which turns out to quite tedious.

In my second example, the int at index 60 will be initialized to 40 while the remaining, otherwise uninitialized items will be initialized to 0. The equivalent using the assignment operator would also be fairly tedious.

In my third example, the members foo and bar will be initialized to 13 and 42 while the remaining, otherwise uninitialized members will be initialized to 0. The equivalent using the assignment operator would be quite tedious, though I occasionally use a compound literal to achieve a similar result.

In my fourth example, the initialization sets the value that the variable will contain for it's entire life. No assignment is possible to this variable.

autistic
  • 1
  • 3
  • 35
  • 80
5

An important distinction comes into play when you add a const qualifier:

int const x = 2;

is valid C

int const x;
x = 2;

isn't. Another important difference is for static variables:

static int x = f();

is invalid C

static int x;
x = f();

is valid.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • (And in C++ they mean something different. `static int x = f();` only runs once, the first time execution reaches the function (so it needs a guard variable to check every time, atomic in case multiple threads reach it before the first f() finishes). But `x = f();` runs in every call to the containing function, like in C.) – Peter Cordes Jun 30 '18 at 22:45
4

The behavior must be identical, but any differences in the generated code really depend on the compiler.

For example, the compiler could generate this for the initialized variable:

somefunction:
pushl    %ebp
movl     %esp, %ebp
pushl    $2 ; allocate space for x and store 2 in it
...

and this for the uninitialized, but later assigned variable:

somefunction:
pushl   %ebp
movl    %esp, %ebp
subl    $4, %esp ; allocate space for x
...
movl    $2, -4(%ebp) ; assign 2 to x
...

The C standard does not mandate the generated code to be identical or non-identical in these cases. It only mandates identical behavior of the program in these two cases. And that identical behavior does not necessarily imply identical machine code.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • 1
    That's really interesting since it means that by the C standard, a compiler could theoretically perform meaninglessly code (e.g. loop for 300 interations) before finally performing the assignment. Though it'd fail miserably in terms of efficiency, it'd technically still be compliant with the C standard. – Vilhelm Gray Apr 25 '13 at 15:23
  • 3
    Exactly. There is not a single word about performance (as in guarantees or requirements) in the C standard. But there's no surprise here. You don't get guaranteed performance from the CPU itself (because of the caches, because of the scheduling in the OS, because of interrupts, because of the disk/network speed, because of buffering, etc). – Alexey Frunze Apr 25 '13 at 15:28
  • @VilhelmGray: performance and sanity are quality-of-implementation issues. Only `volatile` and (if supported) `_Atomic` (especially for lock-free atomic objects) really constrain the code-gen choices. It's easily possible to make a standard-conforming but useless or nearly-unusable C implementation. And not just from performance: weird choices of type widths, trap representations, etc. – Peter Cordes Jun 30 '18 at 22:42
  • @PeterCordes: A really obtuse implementation could decide that given `int i;`, a statement like `i = 5;` modifies the stored value of object `i` using an expression (`i = 5;`) that isn't an lvalue of compatible type (an assignment expression isn't an lvalue of any type, since it isn't an lvalue at all) and thus invokes UB. The assignment expression would of course *contain* an lvalue of type `int` as its left operand, but that lvalue would not, by itself, modify the value of `i`. I think one could create a function like... – supercat Jul 05 '18 at 21:01
  • `struct wrappedInt {int x[1];} wrapInt(int x) { struct wrappedInt result; memcpy(&result, &x, sizeof x); return result;}` and then use `memcpy(&i, wrapInt(5).x, sizeof i);` since there is no moment in time when function arguments exist without holding a value, and `memcpy` presumably has some magical exemption from the normal 6.5p7 rules. Of course, no quality compiler should require such shenanigans. – supercat Jul 05 '18 at 21:04
0
int x = 2;

Computer will create variable x and assign to it value 2 almost at the same moment.


int x;
x = 2;

Computer will create variable x. And then it will it assigned to it value 2. It seem that there is no any difference, but...

...let's suppose that your code is like this:

int x; 
{some operators};
x = 2; 

computer may have to access the variable x in order to assign to it value 2. It means that while running program computer will spend more time to access x to assing to it some value unlike if it will create variable and assing this variable at the moment.

Anyway, Deitel HM, Deitel PJ describe this in C How to Program.

yulian
  • 1,601
  • 3
  • 21
  • 49
  • 3
    `int x;` does not *create* a variable, it *declares* one. The computer does not create variables: variables have a *scope* that is determined by their declaration. Execution enters and exits the scope of variable. There is usually no discernible individual cost (in terms of time) for execution entering or exiting the scope of a variable, even for a non-optimizing compiler. Initialization is costly, like assignment, but compilers have no difficulty moving either initialization or assignment closer to first use if needs be. – Pascal Cuoq Apr 25 '13 at 19:30