Difference in x86-32 and x64 Assembly stack allocation for a fixed-size buffer with unoptimized C (GCC)

Question

Doing some basic disassembly and have noticed that the buffer is being given additional buffer space for some reason although what i am looking at in a tutorial uses the same code but is only given the correct (500) chars in length. Why is this?

My code:

#include <stdio.h>
#include <string.h>

int main (int argc, char** argv){
    char buffer[500];
    strcpy(buffer, argv[1]);
    return 0;
}

compiled with GCC, the dissembled code is:

   0x0000000000001139 <+0>:     push   %rbp
   0x000000000000113a <+1>:     mov    %rsp,%rbp
   0x000000000000113d <+4>:     sub    $0x210,%rsp
   0x0000000000001144 <+11>:    mov    %edi,-0x204(%rbp)
   0x000000000000114a <+17>:    mov    %rsi,-0x210(%rbp)
   0x0000000000001151 <+24>:    mov    -0x210(%rbp),%rax
   0x0000000000001158 <+31>:    add    $0x8,%rax
   0x000000000000115c <+35>:    mov    (%rax),%rdx
   0x000000000000115f <+38>:    lea    -0x200(%rbp),%rax
   0x0000000000001166 <+45>:    mov    %rdx,%rsi
   0x0000000000001169 <+48>:    mov    %rax,%rdi
   0x000000000000116c <+51>:    call   0x1030 <strcpy@plt>
   0x0000000000001171 <+56>:    mov    $0x0,%eax
   0x0000000000001176 <+61>:    leave  
   0x0000000000001177 <+62>:    ret

However, this video https://www.youtube.com/watch?v=1S0aBV-Waeo clearly only has 500 bytes assigned

Why is this this the case as the only difference I can see here is one is 32-bit and another (mine) is on x86-64.

What makes you think it is *incorrect* for the function to use more stack than is necessary to accommodate its local variables? There is no one true assembly representation of a given C function, not even for a specific ABI. — John Bollinger, Feb 07 '23 at 18:42
Does this answer your question? [Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/questions/49391001/why-does-the-x86-64-amd64-system-v-abi-mandate-a-16-byte-stack-alignment) — teapot418, Feb 07 '23 at 18:43
`push %ebp` isn't encodeable in 64-bit mode. Seems you're talking about 32-bit mode, not [the ILP32 x32 ABI.](https://en.wikipedia.org/wiki/X32_ABI). See [The most correct way to refer to 32-bit and 64-bit versions of programs for x86-related CPUs?](https://stackoverflow.com/q/53364320) — Peter Cordes, Feb 07 '23 at 18:51
It's pretty pointless to look at unoptimized compiler output and try to make sense out of it. — prl, Feb 08 '23 at 04:42

score 5 · Accepted Answer · answered Feb 07 '23 at 18:42

500 is not a multiple of 16.

The x86-64 ABI (application binary interface) requires the stack pointer to be a multiple of 16 whenever a call instruction is about to happen. (Since call pushes an 8-byte return address, this means the stack pointer is always congruent to 8, mod 16, when control reaches the first instruction of a called function.) For the code shown, it is convenient for the compiler to achieve this requirement by increasing the value it uses in the sub instruction, making it be a multiple of 16.

The x86-32 ABI did not make this requirement, so there was no reason for the compiler used in the video to increase the size of the stack frame.

Note that you appear to have compiled your code without optimization. I get this at -O2:

   0x0000000000000000 <+0>:     sub    $0x208,%rsp
   0x0000000000000007 <+7>:     mov    0x8(%rsi),%rsi
   0x000000000000000b <+11>:    mov    %rsp,%rdi
   0x000000000000000e <+14>:    call   <strcpy@PLT>
   0x0000000000000013 <+19>:    xor    %eax,%eax
   0x0000000000000015 <+21>:    add    $0x208,%rsp
   0x000000000000001c <+28>:    ret

The stack adjustment is still somewhat larger than the size of the array, but not as big as what you had, and no longer a multiple of 16; the difference is that with optimization on, the frame pointer is eliminated, so %rbp does not need to be saved and restored, and so the stack pointer is not a multiple of 16 at the point of the sub instruction.

(Incidentally, there is no requirement anywhere for a stack frame to be as small as possible. "Quality of implementation" dictates that it should be as small as possible, but for various reasons it's quite common for the compiler to miss that target. In my optimized code dump, I don't see any reason why the immediate operand to sub and add couldn't have been 0x1f8 (504).

The other difference for x86-64 is that a debug build needs space to spill register args. But still, 512 bytes would have been enough stack space to spill an 8-byte pointer and 4-byte `int` as well as the 500 byte buffer; [Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment?](https://stackoverflow.com/q/63009070) is a longstanding GCC missed optimization bug. — Peter Cordes, Feb 07 '23 at 18:57

Difference in x86-32 and x64 Assembly stack allocation for a fixed-size buffer with unoptimized C (GCC)

1 Answers1