TL;DR
As everyone already noted, accessing out of bounds memory is Undefined Behavior. However, something very interesting is happening in this particular case, making your program not access memory at all. Dead code got removed!
It's not guaranteed, but most compilers of good quality will optimize if(1) { ... }
or if(0){ ... }
(which is precisely the case of gcc
) even in -O0
. Check this answer and this answer.
Logic Reasoning
Your compiler is "optimizing" that if
condition based on simple logic, that's why it's always working even with the -O0
flag. This memory access will never happen. When your compiler finds a[1000] == a[1000]
, or really a[n] == a[n]
it knows that it's essentially the same thing as saying VAR == VAR
which is the same for any variable and is always true for any variable. This comes from Formal Logic and is called Principle of Identity, which states that any element A
is equal to itself. I don't know if there's an specific optimization flag for that, but I don't think there is (specially because it happens in -O0
). If anyone knows about one, please let me know in the comments.
In other words, your compiler swaps your if(a[1000] == a[1000])
for if(1)
, which is always true, so it removes the if
altogether.
It is very important to note that accessing out of bounds memory is always undefined behavior, HOWEVER, in this case, the translated code never access any memory. To prove it, some disassembled code:
The code you provided, compiled with gcc -O0 -o foo foo.c
outputs the following foo
function:
(gdb) disass foo
Dump of assembler code for function foo:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: sub $0x10,%rsp
0x0000000000400535 <+8>: mov %rdi,-0x8(%rbp)
0x0000000000400539 <+12>:mov $0x4005f4,%edi
0x000000000040053e <+17>:mov $0x0,%eax
0x0000000000400543 <+22>:callq 0x400410 <printf@plt>
0x0000000000400548 <+27>:leaveq
0x0000000000400549 <+28>:retq
End of assembler dump.
Notice the instruction mov %rdi,-0x8(%rbp)
. This is saving the function argument into the stack. That's your pointer. Right after it, it stores $0x4005f4
into edi
(which probably is the address of your "Hello" string in the data segment) and sets eax
to zero, then calls printf
. Lets check:
(gdb) print (char*)0x4005f4
$3 = 0x400614 "Hello"
Bullseye! Well, wait! Where's that if
? I don't see any cmp
instructions here, or any other kinds of branches.... That if
got "optimized" away. It's not really an optimization option from GCC, rather is a logic optimization. 1 is always equal to 1. The compiler knows that before outputting machine code, so your if
never got to the binary and no memory access got done.
However, if you were to do if(a[1000] == a[1001])
and compile with the same gcc -O0 -o foo foo.c
you'll get this foo
:
(gdb) disass foo
Dump of assembler code for function foo:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: sub $0x10,%rsp
0x0000000000400535 <+8>: mov %rdi,-0x8(%rbp)
0x0000000000400539 <+12>:mov -0x8(%rbp),%rax
0x000000000040053d <+16>:add $0xfa0,%rax
0x0000000000400543 <+22>:mov (%rax),%edx
0x0000000000400545 <+24>:mov -0x8(%rbp),%rax
0x0000000000400549 <+28>:add $0xfa4,%rax
0x000000000040054f <+34>:mov (%rax),%eax
0x0000000000400551 <+36>:cmp %eax,%edx
0x0000000000400553 <+38>:jne 0x400564 <foo+55>
0x0000000000400555 <+40>:mov $0x400614,%edi
0x000000000040055a <+45>:mov $0x0,%eax
0x000000000040055f <+50>:callq 0x400410 <printf@plt>
0x0000000000400564 <+55>:leaveq
0x0000000000400565 <+56>:retq
End of assembler dump.
Wow, that's longer!
Now, the usual mov %rdi,-0x8(%rbp)
is there. This is saving our parameter into the stack. The next line, mov -0x8(%rbp),%rax
loads our pointer into rax
. Then, add $0xfa0,%rax
add our 1000 * sizeof(int)
offset into rax
. Until now, all fine. And now, mov (%rax),%edx
tries to access the contents of what's being pointed by rax
and store it in edx
. In other words, this is the actual pointer dereference. If you were steping instructions on GDB, you would get the SIGSEGV on this instruction:
Breakpoint 1, 0x0000000000400531 in foo ()
(gdb) stepi
0x0000000000400535 in foo ()
(gdb) stepi
0x0000000000400539 in foo ()
(gdb) stepi
0x000000000040053d in foo ()
(gdb) stepi
0x0000000000400543 in foo ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400543 in foo ()
Note that after it tries to execute the instruction at 400543
, it crashes. And what's in 400543
? 0x0000000000400543 <+22>:mov (%rax),%edx
. Precisely where it tries to access an out of bound memory. BOOM! There's your undefined behavior.