You're on x86-64 Linux, where the ABI includes a red-zone (128 bytes below RSP). https://stackoverflow.com/tags/red-zone/info.
So the array goes from the bottom of the red-zone up to near the top of what gcc reserved. Compile with -mno-red-zone
to see different code-gen.
Also, your compiler is using RSP, not ESP. ESP is the low 32 bits of RSP, and x86-64 normally has RSP outside the low 32 bits so it would crash if you truncated RSP to 32 bits.
On the Godbolt compiler explorer, I get this from gcc -O3
(with gcc 6.3, 7.3, and 8.1):
main:
sub rsp, 368
mov eax, DWORD PTR [rsp-120] # -128, not -480 which would be outside the red-zone
add rsp, 368
ret
Did you fake your asm output, or does some other version of gcc or some other compiler really load from outside the red-zone on this undefined behaviour (reading an uninitialized array element)? clang just compiles it to ret
, and ICC just returns 0 without loading anything. (Isn't undefined behaviour fun?)
int ext(int*);
int foo() {
int arr[120]; // can't use the red-zone because of later non-inline function call
ext(arr);
return arr[0];
}
# gcc. clang and ICC are similar.
sub rsp, 488
mov rdi, rsp
call ext
mov eax, DWORD PTR [rsp]
add rsp, 488
ret
But we can avoid UB in a leaf function without letting the compiler optimize away the store/reload. (We could maybe just use volatile
instead of inline asm).
int bar() {
int arr[120];
asm("nop # operand was %0" :"=m" (arr[0]) ); // tell the compiler we write arr[0]
return arr[0];
}
# gcc output
bar:
sub rsp, 368
nop # operand was DWORD PTR [rsp-120]
mov eax, DWORD PTR [rsp-120]
add rsp, 368
ret
Note that the compiler only assumes we wrote arr[0], not any of arr[1..119]
.
But anyway, gcc/clang/ICC all put the bottom of the array in the red-zone. See the Godbolt link.
This is a good thing in general: more of the array is within range of a disp8
from RSP, so reference to arr[0]
up to arr[63
or so could use [rsp+disp8]
instead of [rsp+disp32]
addressing modes. Not super useful for one big array, but as a general algorithm for allocating locals on the stack it makes total sense. (gcc doesn't go all the way to the bottom of the red-zone for arr, but clang does, using sub rsp, 360
instead of 368 so the array is still 16-byte aligned. (IIRC, the x86-64 System V ABI at least recommends this for arrays with automatic storage with size >= 16 bytes.)