Why do we need stack allocation when we have a red zone?

Question

I have the following doubts:

As we know System V x86-64 ABI gives us about a fixed-size area (128 bytes) in the stack frame, so called redzone. So, as a result we don't need to use, for example, sub rsp, 12. Just make mov [rsp-12], X and that's all.

But I cannot grasp idea of that. Why does it matter? Is it necessary to sub rsp, 12 without redzone? After all, stack size is limited at the beginning so why sub rsp, 12 is important? I know that it makes possible us to follow the top of the stack but let's ignore it at that moment.

I know what some instructions use rsp value ( like ret) but don't care about it in that moment.

The crux of the problem is: We have no redzone and we've done:

function:
    mov [rsp-16], rcx
    mov [rsp-32], rcx
    mov [rsp-128], rcx
    mov [rsp-1024], rcx
    ret

Is it difference with?

function:
    sub rsp, 1024
    mov [rsp-16], rcx
    mov [rsp-32], rcx
    mov [rsp-128], rcx
    mov [rsp-1024], rcx
    add rsp, 1024
    ret

The second snippet of code you've shown here is wrong. If you decrement the stack pointer, you *must* restore it before returning from the function. So, you would need to add `add rsp, 1024` before `ret`. — Cody Gray - on strike, Jun 21 '16 at 11:11
Which ABI is that? I assume the Linux one, but there are others, e.g. the one for Windows 64, Mac OS X 64 bit, etc. — Rudy Velthuis, Jun 21 '16 at 12:19
@rudy As far as I understand, there are only two x86-64 ABIs: the System V AMD64 ABI (used by Linux, Solaris, OS X, and other POSIX-compliant operating systems), and Microsoft's implementation used on Windows. The question appears to be about the former. — Cody Gray - on strike, Jun 21 '16 at 13:14
These are the major ones, but I'm sure there are more. That is why I like it if people state which one they mean. Not everyone uses the POSIX-compliant OSes. — Rudy Velthuis, Jun 21 '16 at 15:11
@RudyVelthuis: I agree, the question wrongly implied that there was only one ABI, so I fixed it. BTW, if there are any x86-64 ABIs other than System V or Win64 (old-style or `__vectorcall`), they're probably only subtle modifications to one of those. I haven't heard of any, but OTOH I haven't gone looking. — Peter Cordes, Jun 21 '16 at 16:20
Related: http://stackoverflow.com/questions/38042188/where-exactly-is-the-red-zone-on-x86-64, and http://stackoverflow.com/questions/25787408/amd64-abi-128-byte-red-zone — Peter Cordes, Jun 26 '16 at 22:21
@Gilgamesz, your first snippet jeopardises values that are above(I mean, in lower addresses) the stack pointer. If there is no "red zone" as a prerequisite as you mentioned, one could not rely on safety of area above the stack pointer, that could be easily clobbered by signal/exception/interrupt handlers. Also, Linux kernel is compiled with `-mno-red-zone` to gcc, because AFAIK, interrupts do not respect "red zone" on amd64, so kernel could not rely on invariance of them before and after interrupt handling. — Bulat M., Sep 19 '16 at 05:16

score 14 · Accepted Answer · edited Jun 20 '20 at 09:12

The "red zone" is not strictly necessary. In your terms, it could be considered "pointless." Everything that you could do using the red zone, you could also do the traditional way that you did it targeting the IA-32 ABI.

Here's what the AMD64 ABI says about the "red zone":

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

The real purpose of the red zone is as an optimization. Its existence allows code to assume that the 128 bytes below rsp will not be asynchronously clobbered by signals or interrupt handlers, which makes it possible to use it as scratch space. This makes it unnecessary to explicitly create scratch space on the stack by moving the stack pointer in rsp. This is an optimization because the instructions to decrement and restore rsp can now be elided, saving time and space.

So yes, while you could do this with AMD64 (and would need to do it with IA-32):

function:
    push rbp                      ; standard "prologue" to save the
    mov  rbp, rsp                 ;   original value of rsp

    sub  rsp, 32                  ; reserve scratch area on stack
    mov  QWORD PTR [rsp],   rcx   ; copy rcx into our scratch area
    mov  QWORD PTR [rsp+8], rdx   ; copy rdx into our scratch area

    ; ...do something that clobbers rcx and rdx...

    mov  rcx, [rsp]               ; retrieve original value of rcx from our scratch area
    mov  rdx, [rsp+8]             ; retrieve original value of rdx from our scratch area
    add  rsp, 32                  ; give back the stack space we used as scratch area

    pop  rbp                      ; standard "epilogue" to restore rsp
    ret

we don't need to do it in cases where we only need a 128-byte scratch area (or smaller), because then we can use the red zone as our scratch area.

Plus, since we no longer have to decrement the stack pointer, we can use rsp as the base pointer (instead of rbp), making it unnecessary to save and restore rbp (in the prologue and epilogue), and also freeing up rbp for use as another general-purpose register!

(Technically, turning on frame-pointer omission (-fomit-frame-pointer, enabled by default with -O1 since the ABI allows it) would also make it possible for the compiler to elide the prologue and epilogue sections, with the same benefits. However, absent a red zone, the need to adjust the stack pointer to reserve space would not change.)

Note, however, that the ABI only guarantees that asynchronous things like signals and interrupt handlers not modify the red zone. Calls to other functions may clobber values in the red zone, so it is not particularly useful except in leaf functions (which those functions that do not call any other functions, as if they were at the "leaf" of a function-call tree).

A final point: the Windows x64 ABI deviates slightly from the AMD64 ABI used on other operating systems. In particular, it has no concept of a "red zone". The area beyond rsp is considered volatile and subject to be overwritten at any time. Instead, it requires that the caller allocate a home address space on the stack, which is then available for the callee's use in the event that it needs to spill any of the register-passed parameters.

Ok, now it is clear. So, I understand that signal/interrupt handler of our proccess just take ( in fact the OS give it) `rsp` and use it. And, indeed it can make the problem when we have something below `rsp`. Obviously it is a situation without redzone. Yeah? — Gilgamesz, Jun 21 '16 at 12:08
A signal or interrupt handler *cannot* use the red zone. That is guaranteed by the ABI. — Cody Gray - on strike, Jun 21 '16 at 13:05
I know it's an old question but maybe I can get lucky anyways my question is what will happen when we want to use a memory of more than 128 bytes how much rsp will be moved will it be 8 bytes aligned or 16 bytes aligned? — DeathNet123, Jun 24 '22 at 18:12
@DeathNet123: [Why does the compiler reserve a little stack space but not the whole array size?](https://stackoverflow.com/a/51523492) - Yes, GCC will reserve the space needed minus 128, still using the red zone, even if the allocation is for one large array. And GCC will align RSP by 16 when it has to move it at all. — Peter Cordes, Aug 28 '23 at 16:52

score 4 · Answer 2 · answered Jun 21 '16 at 11:21

4

You have the offsets the wrong way around in your example, which is why it does not make sense. Code should not access the region below the stack pointer - it is undefined. The red-zone is there to protect the first 128 bytes below the stack pointer. Your second example should read:

function:
    sub rsp, 1024
    mov [rsp+16], rcx
    mov [rsp+32], rcx
    mov [rsp+128], rcx
    mov [rsp+1016], rcx
    add rsp, 1024
    ret

If the amount of scratch space that a function needs is up to 128 bytes then it can use addresses below the stack pointer without needing to adjust the stack: this is the optimisation. Compare:

function:        // Not using red-zone.
    sub rsp, 128
    mov [rsp+120], rcx
    add rsp, 128
    ret

With the same code using the red-zone:

function:        // Using the red-zone, no adjustment of stack
    mov [rsp-8], rcx
    ret

The confusion about the offsets from the stack pointer is normally caused because compilers generate negative offsets from the frame (RBP), not positive offsets from the stack (RSP).

answered Jun 21 '16 at 11:21

Andrew

2,943
18
23

3

`-fomit-frame-pointer` is the default for gcc targeting Linux, at `-O1` and higher, and has been for years. You usually only see offsets from `rbp` for locals in `-O0` output, which isn't very fun to look. Fun fact: the size of the red-zone was chosen because `-128` is the maximum one-byte displacement from `rsp`. – Peter Cordes Jun 21 '16 at 16:23
I didn't know that had become a default. I'm older than I look :) – Andrew Jun 21 '16 at 17:51
@Peter, why not 255 bytes, as unsigned byte can take maximum value of 255? – Bulat M. Sep 19 '16 at 05:38
2

@BulatM.: Obviously because x86 disp8 displacements are signed 8 bits, just like imm8 immediate operands for instructions with operand-size larger than 8 bits. – Peter Cordes Sep 19 '16 at 05:42

Why do we need stack allocation when we have a red zone?

2 Answers2

Linked