50

I've been trying to gain a deeper understanding of how compilers generate machine code, and more specifically how GCC deals with the stack. In doing so I've been writing simple C programs, compiling them into assembly and trying my best to understand the outcome. Here's a simple program and the output it generates:

asmtest.c:

void main() {
    char buffer[5];
}

asmtest.s:

pushl   %ebp
movl    %esp, %ebp
subl    $24, %esp
leave
ret

What's puzzling to me is why 24 bytes are being allocated for the stack. I know that because of how the processor addresses memory, the stack has to be allocated in increments of 4, but if this were the case, we should only move the stack pointer by 8 bytes, not 24. For reference, a buffer of 17 bytes produces a stack pointer moved 40 bytes and no buffer at all moves the stack pointer 8. A buffer between 1 and 16 bytes inclusive moves ESP 24 bytes.

Now assuming the 8 bytes is a necessary constant (what is it needed for?), this means that we're allocating in chunks of 16 bytes. Why would the compiler be aligning in such a way? I'm using an x86_64 processor, but even a 64bit word should only require an 8 byte alignment. Why the discrepancy?

For reference I'm compiling this on a Mac running 10.5 with gcc 4.0.1 and no optimizations enabled.

JFMR
  • 23,265
  • 4
  • 52
  • 76
David
  • 1,101
  • 2
  • 9
  • 11
  • 1
    Related: [Why does System V / AMD64 ABI mandate a 16 byte stack alignment?](//stackoverflow.com/q/49391001), the reasoning applies to the i386 SysV ABI as well, and gcc's `-mprefered-stack-boundary` default setting which was 16 bytes for 32-bit code even before the i386 SysV ABI officially changed to require / guarantee it. – Peter Cordes Apr 11 '18 at 02:51
  • Strange, I have tried the same code, with `-mpreferred-stack-boundary=4` but there is only a subtraction of 16 from `esp`. – Ta Thanh Dinh Jun 13 '18 at 09:44
  • Related: [Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment?](https://stackoverflow.com/q/63009070) - `sub $8, %esp` should re-align the stack, and make those 8 bytes usable for the array. The extra 16 is a gcc missed-optimization. – Peter Cordes Jul 25 '20 at 05:29

6 Answers6

53

It's a gcc feature controlled by -mpreferred-stack-boundary=n where the compiler tries to keep items on the stack aligned to 2^n. If you changed n to 2, it would only allocate 8 bytes on the stack. The default value for n is 4 i.e. it will try to align to 16-byte boundaries.

Why there's the "default" 8 bytes and then 24=8+16 bytes is because the stack already contains 8 bytes for leave and ret, so the compiled code must adjust the stack first by 8 bytes to get it aligned to 2^4=16.

laalto
  • 150,114
  • 66
  • 286
  • 303
  • 1
    did "push %ebp" made esp decreased by 8 byte? plus ret's 8 bytes, there should already be aligned with 16-byte. Why dose compiler need this additional 8 bytes? – Joe.Z Jul 12 '13 at 07:52
  • 1
    oh, I got it. This is a 32-bit machince. Sorry. It should be ret 4 byte + ebp 4 byte + aligned 8 byte + buffer 16 – Joe.Z Jul 12 '13 at 13:05
  • 1
    The current versions of the i386 and x86-64 System V ABIs require 16B stack alignment (before a `call` instruction), so functions are allowed to assume that. Historically, the i386 ABI only required 4B alignment. (see https://stackoverflow.com/tags/x86/info for links to ABI docs). GCC also keeps `%esp` aligned even in leaf functions (that don't call other functions), when it has to reserve any space, and that's what's going on here. – Peter Cordes Sep 07 '17 at 19:31
12

The SSEx family of instructions REQUIRES packed 128-bit vectors to be aligned to 16 bytes - otherwise you get a segfault trying to load/store them. I.e. if you want to safely pass 16-byte vectors for use with SSE on the stack, the stack needs to be consistently kept aligned to 16. GCC accounts for that by default.

stormsoul
  • 476
  • 2
  • 5
  • I might have too little experience with the matter to claim that your answer is wrong. But don't you use `movupd` and similar **u**naligned instructions exactly for that purpose (loading/storing _unaligned_ packed data)? From what I understand, you _can_ get faulty behavior when trying to use `movapd` and similar instructions on unaligned data, but data being unaligned shouldn't be a problem in general. – andreee Dec 15 '15 at 20:53
  • @andreee: `movups` is slower on Core2 and earlier, even when the data is aligned. The ABI was designed back when all CPUs were like this. And besides, aligned allows you to `paddd xmm0, [rsp]` instead of needing a separate `movdqu` instruction. See [Why does System V / AMD64 ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/questions/49391001/why-does-system-v-amd64-abi-mandate-a-16-byte-stack-alignment) – Peter Cordes Apr 11 '18 at 02:41
4

I found this site, which has some decent explanation at the bottom of the page about why the stack might be larger. Scale the concept up to a 64bit machine and it might explain what you are seeing.

Chris Arguin
  • 11,850
  • 4
  • 34
  • 50
3

LWN have an article on memory alignment, that you may find interesting.

J-16 SDiZ
  • 26,473
  • 4
  • 65
  • 84
1

The Mac OS X / Darwin x86 ABI requires a stack alignment of 16 bytes. This is not the case on other x86 platforms such as Linux, Win32, FreeBSD ...

Ringding
  • 2,856
  • 17
  • 10
  • 1
    The actual ABI requirement is that the stack be 16 byte aligned *at function call boundaries*. – Stephen Canon Nov 24 '09 at 02:18
  • 2
    This is true, but since function prologues/epilogues are about the only places where the stack pointer is changed, this is almost the same as saying that it needs to be aligned at all times. – Ringding Nov 30 '09 at 18:08
-1

The 8 bytes is there because the first instruction pushes the starting value of %ebp on the stack (assuming 64-bit).

brian sharon
  • 563
  • 4
  • 6