1

why does this code:

#include "stdio.h"
int main(void) {
    puts("Hello, World!");
}

decide to initialize a stack frame? Here is the assembly code:

.LC0:
        .string "Hello, World!"
main:
        push    rbp
        mov     rbp, rsp
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        mov     eax, 0
        pop     rbp
        ret

Why does the compiler initialize a stack frame only for it to be destroyed later, withoput it ever being used? This surely wont cause any errors on the outside of the main function because I never use the stack, so I wont cause any errors. Why is it compiled this way?

Riolku
  • 572
  • 1
  • 4
  • 10

3 Answers3

2

Having these steps in every compiled function is the "baseline" for the compiler, unoptimized. It looks clean in disassembly, and makes sense. However, the compiler can optimize the output to reduce overhead from code that has no real effect. You can see this by compiling with different optimization levels.

What you got is like this:

.LC0:
  .string "Hello, World!"
main:
  push rbp
  mov rbp, rsp
  mov edi, OFFSET FLAT:.LC0
  call puts
  mov eax, 0
  pop rbp
  ret

That's compiled in GCC with no optimization.

Adding the flag -O4 gives this output:

.LC0:
  .string "Hello, World!"
main:
  sub rsp, 8
  mov edi, OFFSET FLAT:.LC0
  call puts
  xor eax, eax
  add rsp, 8
  ret

You'll notice that this still moves the stack pointer, but it skips changing the base pointer, and avoid the time-consuming memory access associated with that.

The stack is assumed to be aligned on a 16-byte boundary. With the return address having been pushed, this leaves another 8 bytes to be subtracted to get to the boundary before the function call.

Thomas Jager
  • 4,836
  • 2
  • 16
  • 30
  • 2
    @Riolku That's to align the stack to 16 bytes for the call to `puts` as required by the ABI. – fuz Jul 24 '18 at 12:32
  • 1
    @ThomasJager why -O4 and not just -O3? AFAIK any number above 3 has no further effect. – davmac Jul 24 '18 at 12:34
  • @davmac I couldn't remember where it ended, so I went with the comment by Jabberwocky on the question. – Thomas Jager Jul 24 '18 at 12:35
  • @Riolku I've edited the answer to answer this, based on the comments – Thomas Jager Jul 24 '18 at 13:06
  • The x86-64 System V ABI has *always* required 16-byte stack-alignment, hasn't it? Is that "since gcc4.5" claim talking about 32-bit code, or am I mistaken? [Why does System V / AMD64 ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/q/49391001). I think it was around gcc4.1 / 4.2 that `gcc -m32` started assuming 16-byte stack alignment. – Peter Cordes Jul 24 '18 at 14:36
  • @PeterCordes According to what I've seen, the patches to align at 16 bytes instead of 4 were officially added in 4.5, but I've edited to avoid talking about things I don't really know. – Thomas Jager Jul 24 '18 at 14:38
  • 1
    They were definitely in by 4.4.7, if https://godbolt.org/g/vEKY78 is any indication. `-mpreferred-stack-boundary=2` changes the code-gen to not reserve as much stack in a non-leaf function that just calls a one-arg function, even in 4.1.2. And 4.4.7 properly handles `__attribute__((aligned(32)))` (unlike 4.1). Anyway, this is all for `-m32`; I don't think `-m64` ever did less than 16-byte alignment. – Peter Cordes Jul 24 '18 at 14:47
  • Compiling with -O3, my code does the following: `mov edi, OFFSET FLAT hello` `jmp puts` this is even more optimized :P – Riolku Jul 24 '18 at 15:54
  • @PeterCordes : I believe GCC 4.5 was the release where the default for GCC was `-mpreferred-stack-boundary=4` – Michael Petch Jul 24 '18 at 19:30
  • @MichaelPetch: You're still talking about `-m32`, right? Anyway, my testing on Godbolt shows that *those* builds of gcc behave like `-mpreferred-stack-boundary=4`. But maybe 4.4.7 backported the patch from 4.5? Or maybe Matt configured the old gcc versions with more modern configs? – Peter Cordes Jul 24 '18 at 19:33
  • Yes, I said that in the context of `-m32`, as 64-bit has always had a minimum 16 byte alignment. – Michael Petch Jul 24 '18 at 19:34
  • 1
    @PeterCordes Looking at GCC release history 4.5.0 was released ~2 years before 4.4.7 was released.I'll bet that 4.4.7 included the change to the default stack boundary – Michael Petch Jul 24 '18 at 19:46
  • 1
    @Riolku: a tail-call is only possible in a `void` function like `void foo(){puts("msg");}`, or if you do `return puts("msg");` in `main`. Otherwise the implicit `return 0` at the end of `main` means that an optimized tail-call isn't possible. – Peter Cordes Jul 24 '18 at 19:52
0

It's very common for compilers to generate unoptimized code in the least complicated way possible (or at least the least complicated way that doesn't lead to code that's so bad that the optimizer won't be able to fix it) to keep the code simple and to stick to the one-responsibility principle (in the sense that making code more efficient is the optimizer's job).

Generating code to initialize the stack for all functions is less complicated than only doing so where necessary. Since the optimizer will be able to remove the unnecessary code anyway (and it will do so in more cases than a simple "does this function have any local variables?" check would), generating the unnecessary code won't have any effect as long as optimizations are enabled (and if they're not, it's expected that the generated code will contain inefficiencies).

If we did add a "does this function have any local variables?" check to the function that generates the stack-initialization code, we'd be re-inventing a less powerful version of an optimization that the optimizer already performs anyway, so we'd be violating the one-responsibility principle and increasing the complexity of the part of the compiler that could otherwise be relatively simple (as opposed to the optimizer, which is full of complicated algorithms anyway).

sepp2k
  • 363,768
  • 54
  • 674
  • 675
0

The stack frame makes it possible to inspect the call stack during runtime. This is useful:

As already pointed out by others, a compiler may omit the stackframe on higher optimization levels.
See also: How do you get gcc's __builtin_frame_address to work with -O2?

Ruud Helderman
  • 10,563
  • 1
  • 26
  • 45