1

In x64 assembly, the stack frame, according to Microsoft, should be 16-byte aligned

The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the return address is pushed), and except where indicated in Function Types for a certain class of frame functions.

Assume we have the following function:

void foo() {
    long long int foo;
    long long int bar;
}

The stack would look something like this:

|-----------|
|   rbp     |    
|-----------|
|   foo     |    
|-----------|
|   bar     |    
|-----------|
| undefined |    
|-----------|

So, the stack would need to allocate 20h bytes. And the assembly instruction would look like:

push rbp
mov rbp, rsp
sub 20h        ; 32 bytes needed to fulfill alignment requirements

Is my understanding of this correct, or am I way off? I'm also assuming no optimizations.

Happy Jerry
  • 164
  • 1
  • 8
  • That's one way of laying out those two 8-byte local vars, but wastes 16 bytes. Look at compiler output on https://godbolt.org/ (use initializers so the compiler will actually do something with that memory). It's a leaf function, so no need to reserve shadow space for a callee. I don't see a question here; what are you asking? – Peter Cordes Feb 14 '22 at 02:50
  • @PeterCordes I was making sure I understood correctly about 16-byte alignment and that the Microsoft compiler would indeed subtract `20h`. I could not replicate setting up the stack frame with the function prolog in godbolt – Happy Jerry Feb 14 '22 at 03:18
  • Right, I wouldn't expect MSVC to waste 16 extra bytes the way your layout does. Keep in mind `long long int foo` is only 8 bytes, not 16 like your table shows. So with RSP % 16 == 8 on function entry (after a `call` pushes are return address), so it's re-aligned after `push rbp`, if you choose to spend instructions setting up a frame pointer at all. – Peter Cordes Feb 14 '22 at 03:38
  • @PeterCordes I didn't mean for my table to depict foo as 16-bytes wide, I was showing that's where the 16-byte alignment took place. That's just bad communication from the table. I'll edit it. Also, I thought the specification made did not align the function prolog? – Happy Jerry Feb 14 '22 at 03:51
  • 1
    The MS spec is phrased that way because the stack moves by 8 bytes at a time during the prologue which can contain push instructions. (And because the stack is only aligned by 8 on function entry). So no, the saved-RBP is at a 16-byte aligned address, your `foo` isn't. Try it in asm and single-step into it with a debugger, and look at RSP. – Peter Cordes Feb 14 '22 at 03:55
  • @PeterCordes Ah, I see. Thank you. So the alignment begins at the the pushed address from call, even though the stack frame begins at rbp. – Happy Jerry Feb 14 '22 at 04:06
  • 2
    Right, RSP % 16 = 0 *before a call*, so the stack args (if any) are aligned, therefore RSP % 16 = 8 after a call. In terms of the phrasing you quoted, it's because calls happen after the function prologue. – Peter Cordes Feb 14 '22 at 04:17
  • @PeterCordes Suppose we push arguments onto the stack for the next function call (this is only the case when there are more than 6 arguments I believe). These arguments would need to be properly aligned as well. This would reflect when the function is allocating space for local variables and might over allocate to satisfy alignment, correct? Like, if I pushed 2 arguments onto the stack. I would have the two arguments + return address resulting in this situation: 24 % 16 and would need to allocate an additional 8 bytes. Am I misunderstanding anything – Happy Jerry Feb 15 '22 at 02:27
  • 1
    Windows x64 only passes up to 4 register args; maybe you're thinking of x86-64 System V for 6 integer register args, *and* (instead of or) 8 FP register args. But anyway, just look at compiler output for a `printf` or something that makes it easy to pass more args. https://godbolt.org/z/TzhjfP1qx - MSVC follows the guidelines you quoted and avoids `push` for storing stack args, instead using `mov` into the space allocated in the prologue. Using push would mean temporarily misaligning RSP, and would be after the prologue. GCC does the same if you tell it the caller is also `ms_abi` – Peter Cordes Feb 15 '22 at 03:29

1 Answers1

0

I have been trying to figure this out myself and I believe I understand what is needed here.

Assuming the CALLER of foo() ensured 16-byte alignment before the call then:

  • The CALL statement (by the caller of your function foo() ) puts 8-bytes on the stack i.e. the return address. [Stack not 16 Byte Aligned]
  • Then your function pushed RBP on the stack (8-bytes). [Stack is 16 Byte Aligned]
  • You make space for the FOO & BAR local variables adding 2x 8-bytes. [Stack is still 16-Byte Aligned]

In this scenario I see no need for extra padding, however, you cannot guarantee that the CALLER of your function ensured 16-byte alignment before the call (unless you are working in an environment where it is guaranteed) so you need to make sure by adding the following line of code:

and   rsp, -16

so your final code should be:

push rbp
mov  rbp, rsp
sub  rsp, 10h     ; make space for locals foo and bar
and  rsp, -16     ; force 16-byte alignment by zeroing last four bits

Now you are sure that the state of the stack is 16-byte aligned before you make any call to a function that may require it.

Note that if you know that the functions you call from your foo() procedure do not need 16-byte alignment then there should be no need to go thru' these hoops.

David A
  • 123
  • 5
  • The Windows x64 ABI *does* require callers to have aligned RSP by 16 before a `call`. That's why compiler-generated code doesn't `and rsp, -16` in every non-leaf function; dead reckoning from the known incoming alignment of `RSP % 16 == 8` is sufficient unless you have an `alignas(32)` local variable. (The x86-64 System V ABI is the same: [Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/q/49391001)) – Peter Cordes Jun 13 '22 at 20:11
  • You'd only need `and rsp, -16` if you had some code that didn't preserve stack alignment, but then wanted to call a compiler-generated function (including a library function) and had to recover stack alignment to follow the ABI. – Peter Cordes Jun 13 '22 at 20:13