I'm afraid the accepted answer does not touch the main point of the question:
why
int a;
std :: cout << a << endl; // prints 0
always prints 0
, as if a
was initialized to its default value, whereas in
int a;
std :: cout << &a << " " << a << endl; // 0x7ffc057370f4 32764
the compiler produces some junk value for a
.
Yes, in both cases we have an example of undefined behavior and ANY value for a
is possible, so why in Case 1 there's always 0?
First of all remember that a C/C++ compiler is free to modify the source code in an arbitrary way as long as the meaning of the program remains the same. So, if you write
int a;
std :: cout << a << endl; // prints 0
the compiler is free to assume that a
needs not be associated with any real RAM cells. You don't read it, nor do you write to a
. So the compiler is free to allocate the memory for a
in one of its registers. In such a case a
has no address and is functionally equivalent to something as weird as a "named, addressless temporary". However, in Case 2 you ask the compiler to print the address of a
. In such a case the compiler cannot ignore the request and generates the code for the memory where a
would be allocated even though the value of a
can be a junk.
The next factor is optimization. You can either switch it off completely in Debug compilation mode or turn on aggressive optimization in Release mode. So, you can expect that your simple code will behave differently whether you compile it as Debug or Release. Moreover, since it is undefined behavior, your code may run differently if compiled with different compilers or even different versions of the same compiler.
I prepared a version of your program that is a bit easier to analyze:
#include <iostream>
int f()
{
int a;
return a; // prints 0
}
int g()
{
int a;
return reinterpret_cast<long long int>(&a) + a; // prints 0
}
int main() { std::cout << f() << " " << g() << "\n"; }
Function g
differs form f
in that it uses the address of uninitialized variable a
. I tested it in Godbolt Compiler Explorer: https://godbolt.org/z/os8b583ss You can switch there between various compilers and various optimization options. Please do experiment yourself. For Debug and gcc or clang, use -O0
or -g
, for Release use -O3
.
For the newest (trunk) gcc, we have the following assembly equivalent:
f():
xorl %eax, %eax
ret
g():
leaq -4(%rsp), %rax
addl -4(%rsp), %eax
ret
main:
subq $24, %rsp
xorl %esi, %esi
movl $_ZSt4cout, %edi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
leaq 12(%rsp), %rsi
movl $_ZSt4cout, %edi
addl 12(%rsp), %esi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xorl %eax, %eax
addq $24, %rsp
ret
Please notice that f()
was reduced to a trivial setting of the eax
register to zero ( for any value of integer a
, a xor a
equals 0). eax
is the register where this function is to return its value. Hence 0 in Release. Well, actually, no, the compiler is even smarter: it never calls f()
! Instead, it zeroes the esi register that is used in a call to operator<<
. Similarly, g
is replaced by reading 12(%rsp)
, once as a value, once as the address of. This generates a random value for a
and rather similar values for &a
. AFIK, they're a bit randomized to make the life of hackers attacking our code harder.
Now the same code in Debug:
f():
pushq %rbp
movq %rsp, %rbp
movl -4(%rbp), %eax
popq %rbp
ret
g():
pushq %rbp
movq %rsp, %rbp
leaq -4(%rbp), %rax
movl %eax, %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
ret
main:
pushq %rbp
movq %rsp, %rbp
call f()
movl %eax, %esi
movl $_ZSt4cout, %edi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
call g()
movl %eax, %esi
movl $_ZSt4cout, %edi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
movl $0, %eax
popq %rbp
ret
You can now clearly see, even without knowing the 386 assembly (I don't know it either) that in Debug mode (-g
) the compiler performs no optimization at all. In f()
it reads a
(4 bytes below the frame pointer register value, -4(%rbp)
) and moves it to the "result register" eax
. In g()
, the same is done, but a
is read once as a value and once as an address. Moreover, both f()
and g()
are called in main()
. In this compiler mode, the program produces "random" results for a
(try it yourself!).
To make things even more interesting, here's f()
as compiled by clang (trunk) in Release:
f(): # @f()
retq
g(): # @g()
retq
Can you see? These function are so trivial to clang that it generated no code for them. Moreover, it did not zeroed the registers corresponding to a
, so, unlike g++, clang produces a random value for a
(in both Release and Debug).
You can go with your experiments even further and find that what clang produces for f
depends on whether f
or g
is called first in main.
Now you should have a better understanding of what Undefined Behavior is.