Why calling printf result in a different function prologue for main?

Question

When I compile

#include <stdio.h>
int
main () {
    return 0;
}

to x86 assembly the result is plain and expected:

$> cc -m32 -S main.c -o -|sed -r "/\s*\./d"
main:
    pushl   %ebp
    movl    %esp, %ebp
    movl    $0, %eax
    popl    %ebp
    ret

But when studying different disassembled binaries, the function prologue is never that simple. Indeed, changing the C source above into

#include <stdio.h>
int
main () {
    printf("Hi");
    return 0;
}

the result is

$> cc -m32 -S main.c -o -|sed -r "/\s*\./d"
main:
    leal    4(%esp), %ecx
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ecx
    subl    $4, %esp
    subl    $12, %esp
    call    printf
    addl    $16, %esp
    movl    $0, %eax
    movl    -4(%ebp), %ecx
    leave
    leal    -4(%ecx), %esp
    ret

In particular, I don't get why these instructions

leal    4(%esp), %ecx
andl    $-16, %esp
pushl   -4(%ecx)

are generated -- specifically why not directly storing %esp into %ecx, instead of into%esp+4?

Without optimization, compilers can produce a lot of trash. More than your examples, the double `subl` is a good hint for this bad behaviour. — Youka, Oct 01 '15 at 21:06
Here is a good explanation: http://stackoverflow.com/questions/4228261/understanding-the-purpose-of-some-assembly-statements — Michał Mielec, Oct 01 '15 at 21:23

score 4 · Accepted Answer · answered Oct 01 '15 at 21:20

If main isn't a leaf function, it needs to align the stack for the benefit of any functions it calls. Functions that aren't called main just maintain the stack's alignment.

lea 4(%esp), %ecx   # ecx = esp+4
andl    $-16, %esp
pushl   -4(%ecx)    # load from ecx-4 and push that

It's pushing a copy of the return address, so it will be in the right place after aligning the stack. You're right, a different sequence would be more sensible:

mov    (%esp), %ecx   ; or maybe even  pop %ecx
andl   $-16, %esp
push   %ecx           ; push (mem) is slower than push reg

As Youka says in comments, don't expect code from -O0 to be optimized at all. Use -Og for optimizations that don't interfere with debugability. The gcc manual recommends that for compile/debug/edit cycles. -O0 output is harder to read / understand / learn from than optimized code. It's easier to map back to the source, but it's terrible code.

Why calling printf result in a different function prologue for main?

1 Answers1