3

I tested this C code on Compiler Explorer:

#include <stdio.h>

int main(void)
{
    printf("Hello world!\n");
}

I selected the compiler x86-64 gcc 12.2 with the parameters -std=c17 -O3 -m32.

In 64-bit mode (no -m32 option) the output looks OK and optimized (online example):

.LC0:
        .string "Hello world!"
main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        xor     eax, eax
        add     rsp, 8
        ret

In 32-bit mode, the output is bloated by a lot of non-sense (online example):

.LC0:
        .string "Hello world!"
main:
        lea     ecx, [esp+4]   # OK, so ecx = esp + 4, but main has no parameters...why?
        and     esp, -16
        push    DWORD PTR [ecx-4]   # push [ecx - 4], which is the return address...why?
        push    ebp
        mov     ebp, esp
        push    ecx   # saves the stack address above the return address...what is going on?
        sub     esp, 16   # what is this for?
        push    OFFSET FLAT:.LC0
        call    puts
        mov     ecx, DWORD PTR [ebp-4]   # restores that address
        add     esp, 16   # cleans that add esp, 16??
        xor     eax, eax
        leave
        lea     esp, [ecx-4]   # restores stack as it was on function entry
        ret

I expected something along the lines of:

.LC0:
        .string "Hello world!"
main:
        sub esp, 12   # GCC wants 16-byte aligned stack
        push OFFSET FLAT:.LC0
        call puts
        xor eax, eax
        add esp, 12
        ret

Considering I used the -O3 option, why does the output look like that?

DarkAtom
  • 2,589
  • 1
  • 11
  • 27
  • It's something to do with `gcc`. The output from `clang` is much more compact (e.g.) what you would expect. – Craig Estey Dec 11 '22 at 22:38
  • gcc -m32 doesn't assume that main's incoming stack alignment is the ABI-guaranteed 16. [Responsibility of stack alignment in 32-bit x86 assembly](https://stackoverflow.com/q/40307193) / [What is the purpose of these instructions before the main preamble?](https://reverseengineering.stackexchange.com/a/18969) – Peter Cordes Dec 11 '22 at 22:49
  • @PeterCordes That doesn't explain all the bloat. A simple textbook prologue-epilogue function with the added `and esp, -16` would be more efficient. – DarkAtom Dec 11 '22 at 22:55
  • I know, I'm still looking for the Q&A that explains that, I thought I'd find it sooner. But I'm 100% sure there is one. Ah, found one, [Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address](https://stackoverflow.com/q/1147623) – Peter Cordes Dec 11 '22 at 22:55
  • 1
    Ok there we go, I think those four duplicates fully cover it. I knew there was an answer mentioning that GCC8 and newer use a simpler prologue when they need to align (except in main in 32-bit mode before calling other functions, unfortunately: https://gcc.godbolt.org/z/YhGcqa5or. But if it just needs to align by 32 for a local, it's simple: https://gcc.godbolt.org/z/hh1v59hKv). Turns out it was an answer I'd written :). – Peter Cordes Dec 11 '22 at 23:00
  • @PeterCordes it's an optimization snafu actually, if you compile with `-maccumulate-outgoing-args` it generates much simpler code: https://godbolt.org/z/6oYMzKd66 (and ebp still serves as a frame pointer so simple unwinding via ebp chain still works). – amonakov Dec 12 '22 at 07:54
  • @amonakov: Interesting. But note that EBP isn't usable the way a frame pointer normally is *within that function*; it's pointing to the saved-EBP, so it's an unknown distance from ESP for storing outgoing stack args, or from a 16-byte alignment boundary for storing local vars if any might need more than 4B alignment. That's still fine; as long as there's no `alloca` or VLA, it can just access locals and outgoing args relative to ESP. (That's what this whole dance is about, being able to use alloca as well as being backtrace-friendly. GCC8 other than in main will simplify function that don't.) – Peter Cordes Dec 12 '22 at 08:10
  • 1
    @PeterCordes as my godbolt example hinted, GCC has the same snafu for any function carrying the `force_align_arg_pointer` attribute, which is appiled to `main` implicitly. – amonakov Dec 12 '22 at 08:22
  • @PeterCordes in principle it can also put locals that don't need increased alignment at negative offsets relative to EBP, so it would still be usable as a frame pointer in that sense – amonakov Dec 12 '22 at 13:12
  • @amonakov: True, since it does `sub esp, 16` after aligning so there's at least that much space; it doesn't matter where in the stack frame the 0 to 12 byte variable amount of space lives. It could be below some locals. – Peter Cordes Dec 12 '22 at 15:57

0 Answers0