Why does the compiler reserve a little stack space but not the whole array size?

Question

The following code

int main() {
  int arr[120];
  return arr[0];
}

Compiles into this:

  sub     rsp, 360
  mov     eax, DWORD PTR [rsp-480]
  add     rsp, 360
  ret

Knowing the ints are 4 bytes and the array is size 120, the array should take 480 bytes, but only 360 bytes are subtracted from ESP... Why is this?

Are you *sure* the compiler reads from `[rsp-480]`? I can't reproduce that, and it's outside the red-zone so could only happen because of the undefined behaviour of reading uninitialized array elements. — Peter Cordes, Jul 25 '18 at 20:35
related: [Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?](https://stackoverflow.com/q/28693863) is a better canonical duplicate for non-array questions. — Peter Cordes, Aug 30 '20 at 09:13

Thomas Jager · Accepted Answer · 2020-04-08T18:04:18.583

Below the stack area used by a function, there is a 128-byte red zone that is reserved for program use. Since main calls no other function, it has no need to move the stack pointer by more than it needs, though it doesn't matter in this case. It only subtracts enough from rsp to ensure that the array is protected by the red zone.

You can see the difference by adding a function call to main

int test() {
  int arr[120];
  return arr[0]+arr[119];
}

int main() {
  int arr[120];
  test();
  return arr[0]+arr[119];
}

This gives:

test:
  push rbp
  mov rbp, rsp
  sub rsp, 360
  mov edx, DWORD PTR [rbp-480]
  mov eax, DWORD PTR [rbp-4]
  add eax, edx
  leave
  ret
main:
  push rbp
  mov rbp, rsp
  sub rsp, 480
  mov eax, 0
  call test
  mov edx, DWORD PTR [rbp-480]
  mov eax, DWORD PTR [rbp-4]
  add eax, edx
  leave
  ret

You can see that the main function subtracts by 480 because it needs the array to be in its stack space, but test doesn't need to because it doesn't call any functions.

The additional usage of array elements does not significantly change the output, but it was added to make it clear that it's not pretending that those elements don't exist.

You can use inline asm (or maybe `volatile`) to get actual array accesses in a leaf function without having to disable optimization like you're doing here. But nice idea to access `arr[119]` to show where the top is. Using `-fno-omit-frame-pointer` as part of `-O0` makes everything relative to RBP, unlike the OP's code though. — Peter Cordes, Jul 25 '18 at 20:42
can any function not calling other functions use this red zone, or only one? — Riolku, Jul 25 '18 at 21:04

score 2 · Answer 2 · answered Jul 25 '18 at 16:15

You're on x86-64 Linux, where the ABI includes a red-zone (128 bytes below RSP). https://stackoverflow.com/tags/red-zone/info.

So the array goes from the bottom of the red-zone up to near the top of what gcc reserved. Compile with -mno-red-zone to see different code-gen.

Also, your compiler is using RSP, not ESP. ESP is the low 32 bits of RSP, and x86-64 normally has RSP outside the low 32 bits so it would crash if you truncated RSP to 32 bits.

On the Godbolt compiler explorer, I get this from gcc -O3 (with gcc 6.3, 7.3, and 8.1):

main:
    sub     rsp, 368
    mov     eax, DWORD PTR [rsp-120]   # -128, not -480 which would be outside the red-zone
    add     rsp, 368
    ret

Did you fake your asm output, or does some other version of gcc or some other compiler really load from outside the red-zone on this undefined behaviour (reading an uninitialized array element)? clang just compiles it to ret, and ICC just returns 0 without loading anything. (Isn't undefined behaviour fun?)

int ext(int*);
int foo() {
  int arr[120];     // can't use the red-zone because of later non-inline function call
  ext(arr);
  return arr[0];
}
   # gcc.  clang and ICC are similar.
    sub     rsp, 488
    mov     rdi, rsp
    call    ext
    mov     eax, DWORD PTR [rsp]
    add     rsp, 488
    ret

But we can avoid UB in a leaf function without letting the compiler optimize away the store/reload. (We could maybe just use volatile instead of inline asm).

int bar() {
  int arr[120];
  asm("nop # operand was %0" :"=m" (arr[0]) );   // tell the compiler we write arr[0]
  return arr[0];
}

# gcc output
bar:
    sub     rsp, 368
    nop # operand was DWORD PTR [rsp-120]
    mov     eax, DWORD PTR [rsp-120]
    add     rsp, 368
    ret

Note that the compiler only assumes we wrote arr[0], not any of arr[1..119].

But anyway, gcc/clang/ICC all put the bottom of the array in the red-zone. See the Godbolt link.

This is a good thing in general: more of the array is within range of a disp8 from RSP, so reference to arr[0] up to arr[63 or so could use [rsp+disp8] instead of [rsp+disp32] addressing modes. Not super useful for one big array, but as a general algorithm for allocating locals on the stack it makes total sense. (gcc doesn't go all the way to the bottom of the red-zone for arr, but clang does, using sub rsp, 360 instead of 368 so the array is still 16-byte aligned. (IIRC, the x86-64 System V ABI at least recommends this for arrays with automatic storage with size >= 16 bytes.)

Why does the compiler reserve a little stack space but not the whole array size?

2 Answers2

Linked

Related