During a discussion I had with a couple of colleagues the other day I threw together a piece of code in C++ to illustrate a memory access violation.
I am currently in the process of slowly returning to C++ after a long spell of almost exclusively using languages with garbage collection and, I guess, my loss of touch shows, since I've been quite puzzled by the behaviour my short program exhibited.
The code in question is as such:
#include <iostream>
using std::cout;
using std::endl;
struct A
{
int value;
};
void f()
{
A* pa; // Uninitialized pointer
cout<< pa << endl;
pa->value = 42; // Writing via an uninitialized pointer
}
int main(int argc, char** argv)
{
f();
cout<< "Returned to main()" << endl;
return 0;
}
I compiled it with GCC 4.9.2 on Ubuntu 15.04 with -O2
compiler flag set. My expectations when running it were that it would crash when the line, denoted by my comment as "writing via an uninitialized pointer", got executed.
Contrary to my expectations, however, the program ran successfully to the end, producing the following output:
0
Returned to main()
I recompiled the code with a -O0
flag (to disable all optimizations) and ran the program again. This time, the behaviour was as I expected:
0
Segmentation fault
(Well, almost: I didn't expect a pointer to be initialized to 0.) Based on this observation, I presume that when compiling with -O2
set, the fatal instruction got optimized away. This makes sense, since no further code accesses the pa->value
after it's set by the offending line, so, presumably, the compiler determined that its removal would not modify the observable behaviour of the program.
I reproduced this several times and every time the program would crash when compiled without optimization and miraculously work, when compiled with -O2
.
My hypothesis was further confirmed when I added a line, which outputs the pa->value
, to the end of f()
's body:
cout<< pa->value << endl;
Just as expected, with this line in place, the program consistently crashes, regardless of the optimization level, with which it was compiled.
This all makes sense, if my assumptions so far are correct.
However, where my understanding breaks somewhat is in case where I move the code from the body of f()
directly to main()
, like so:
int main(int argc, char** argv)
{
A* pa;
cout<< pa << endl;
pa->value = 42;
cout<< pa->value << endl;
return 0;
}
With optimizations disabled, this program crashes, just as expected. With -O2
, however, the program successfully runs to the end and produces the following output:
0
42
And this makes no sense to me.
This answer mentions "dereferencing a pointer that has not yet been definitely initialized", which is exactly what I'm doing, as one of the sources of undefined behaviour in C++.
So, is this difference in the way optimization affects the code in main()
, compared to the code in f()
, entirely explained by the fact that my program contains UB, and thus compiler is technically free to "go nuts", or is there some fundamental difference, which I don't know of, between the way code in main()
is optimized, compared to code in other routines?