3

I was looking at the assembly output of 'objdump -S' and noticed something strange. This was on cygwin/x86_64 v. 3.1.5 with gcc 9.3.0 on Windows 10.

Here is the assembly output of a particular function (the function is not useful and is merely illustrative of the problem):

u_int64_t returnit(u_int64_t x) {
   1004010b9:   55                      push   rbp
   1004010ba:   48 89 e5                mov    rbp,rsp
   1004010bd:   48 83 ec 10             sub    rsp,0x10
   1004010c1:   48 89 4d 10             mov    QWORD PTR [rbp+0x10],rcx
        u_int64_t a = 1;
   1004010c5:   48 c7 45 f8 01 00 00    mov    QWORD PTR [rbp-0x8],0x1
   1004010cc:   00

        return a + x;
   1004010cd:   48 8b 55 f8             mov    rdx,QWORD PTR [rbp-0x8]
   1004010d1:   48 8b 45 10             mov    rax,QWORD PTR [rbp+0x10]
   1004010d5:   48 01 d0                add    rax,rdx
}
   1004010d8:   48 83 c4 10             add    rsp,0x10
   1004010dc:   5d                      pop    rbp
   1004010dd:   c3                      ret

Almost everything looks normal: set up the stack frame, with extra space for the local variable, and copy the passed argument ("x", in register rcx) to a position on the stack.

Here's the part that seems odd:

mov    QWORD PTR [rbp+0x10],rcx

It's copying the contents of rcx OUTSIDE the current stack frame. Local variable(s) are stored in the current stack frame, as they should be.

I tried this on an older installation of cygwin (32-bit, v. 2.9.0 with gcc 6.4.0) and it behaved the same way.

I also tried this on other platforms - an older ubuntu linux liveboot with kernel 4.4.0 and gcc 5.3.1, and a FreeBSD 12.1 box with clang 8.0.1, both 64-bit - and they do what one would expect, copying the value of the argument passed in a register inside the local stack frame. For example, here's the relevant line on FreeBSD (it uses rdi instead of rcx):

2012e8:       89 7d fc                mov    DWORD PTR [rbp-0x4],edi

Is there some particular reason it's done this way on cygwin?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
sj95126
  • 6,520
  • 2
  • 15
  • 34
  • BTW, I'd expect gcc for the x86-64 System V ABI to use space below RSP, in the [red zone](https://en.wikipedia.org/wiki/Red_zone_(computing)). That is technically part of the current stack frame, even though it's not between RBP and RSP (if RBP was set up as a traditional frame pointer), because it won't do `sub rsp, 16` – Peter Cordes Jun 30 '20 at 20:49
  • @PeterCordes : the 64-bit GCC Windows compilers in unoptimized code will always generate copies of the incoming parameter registers into the shadow space. If you were to modify this function to have 4 integer class parameters it would copy the 4 registers RCX, RDX, R8, and R9 into the shadow space in the same way this code did RCX. – Michael Petch Jun 30 '20 at 20:57
  • @MichaelPetch: The OP mentioned FreeBSD and Ubuntu at the end of the question, saying those compiler stored within their stack frame, which I thought was surprising (because either they know about red zones but not home / shadow space, or they didn't notice the missing `sub rsp, ...` for the non-Windows compilers). – Peter Cordes Jun 30 '20 at 21:09
  • FreeBSD is definitely using the stack frame; the register argument is accessed as rbp-0x8, and the local variable as rbp-0x10. – sj95126 Jun 30 '20 at 21:20
  • Right, but is that below RSP? If you compile for Linux, like https://godbolt.org/z/3663zJ, there's no `sub rsp, 8` to reserve space for that spill. In x86-64 System V, the stack frame includes the 128 bytes *below* RSP. In other calling conventions, that space is *not* safe from being asynchronously clobbered, and would be considered outside the stack frame in the other direction from the shadow space. The red-zone is in some ways similar to shadow-space: space you can use without reserving first. – Peter Cordes Jun 30 '20 at 21:32
  • @PeterCordes : you are correct, it is red zone. There is no subtraction from rsp at the beginning of the function. (I checked both FreeBSD and Linux) – sj95126 Jun 30 '20 at 21:46

2 Answers2

5

This behaviour conforms to the Windows x64 ABI.

Looking at the x64 stack usage page from Microsoft, we can see that the ABI specifies that space is reserved on the stack the four registers arguments, even if fewer arguments are used. These are the home addresses, which act as a shadow of the actual argument registers.

This area can be used to save arguments that would otherwise be overwritten, to aid in debugging, etc. Given the amount of work being done for an extremely simple operation, I'm assuming that this is unoptimised/debugging code. An optimised compilation of the code would likely skip these redundant stores and loads, and might not touch memory aside from the ret.

Microsoftx64 function stack frames

The Microsoft x64 calling convention used by Windows is different from the one seen in the System V AMD64 ABI used by Linux, OS X, etc. on x86-64.


This example shows the effects of optimisation in MSVC (different compiler, but still Windows-targetting). Without having to actually store values on the stack, the calculation can be done in a single instruction.

Thomas Jager
  • 4,836
  • 2
  • 16
  • 30
  • Hmm, ok, thanks. I guess there's some logic to that, if that's how they wanted to implement the ABI. It just feels like accessing stack space outside your own frame is asking for trouble. It seems like putting the register parameter stack area inside the caller's space rather than the callee's doesn't somehow make more or less vulnerable to overwrites, but it is what it is. – sj95126 Jun 30 '20 at 21:04
  • 2
    @sj95126 The thing is that it's not "outside your own frame". The calling convention explicitly states that this region is effectively owned by the calling function. The CPU has no concept of things being inside or outside the space of the caller or callee. There are just offsets used from the base pointer. There's no affect on making things vulnerable to overwrites because writing outside of valid objects already means that the code is broken. Code executed by a process is completely trusted within that process, so there's no need or mechanism for protection of the frame. – Thomas Jager Jun 30 '20 at 21:09
  • 1
    Also, this behaviour is identical to arguments being passed on the stack, as is done on x86 (non-64-bit) or when there are too many arguments to fit in registers. The Microsoft x64 convention just also reserves this space for register arguments. – Thomas Jager Jun 30 '20 at 21:11
3

This is an addendum to @ThomasJager answer. The output from the compiler is what you will observe when you use unoptimized code in a 64-bit Windows GCC compilers (MingGW, Cygwin, etc). It is copying the incoming parameters passed via RCX, RDX, R8, R9 into the shadow store (aka Shadow Space or Home Space). This does not apply to 32-bit Windows builds. The code you are reviewing would have been generated at -O0 (usually the default). This behaviour is used to make 64-bit debugging easier. There is a related Stackoverflow answer that describes this behaviour:

This is where the Home space comes into play: It can be used by compilers to leave a copy of the register values on the stack for later inspection in the debugger. This usually happens for unoptimized builds. When optimizations are enabled, however, compilers generally treat the Home space as available for scratch use. No copies are left on the stack, and debugging a crash dump turns into a nightmare

With no optimization GCC's behaviour is to make a copy of the parameters passed via RCX, RDX, R8 and R9 . If you had amended the code to look like:

#include<stdint.h>
uint64_t returnit(uint64_t w, uint64_t x, uint64_t y, uint64_t z) {
        return 0;
}

The code generated would have looked something like:

0000000000000000 <returnit>:
   0:   push   rbp
   1:   mov    rbp,rsp
   4:   mov    QWORD PTR [rbp+0x10],rcx
   8:   mov    QWORD PTR [rbp+0x18],rdx
   c:   mov    QWORD PTR [rbp+0x20],r8
  10:   mov    QWORD PTR [rbp+0x28],r9
  14:   mov    eax,0x0
  19:   pop    rbp
  1a:   ret

If you build at optimizations greater than -O0 (-O1, O2, O3, -Os, -Og etc.) then copies of these parameters are not copied into the shadow store.


In a comment the OP mentioned:

It just feels like accessing stack space outside your own frame is asking for trouble. It seems like putting the register parameter stack area inside the caller's space rather than the callee's doesn't somehow make more or less vulnerable to overwrites, but it is what it is

A compiler is free to use the shadow store (on Windows) or even use the space on the stack where parameters are passed. In C the space used for parameters on the stack are callee owned and not caller owned. This is because the C language is pass by value exclusively. A function always gets a copy of the parameters from the caller. The side effect is that a C compiler is free to use any stack space used by function parameters as it sees fit.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198