Why does GCC produce the following asm output?

Question

I don't understand why gcc -S -m32 produces these particular lines of code:

movl    %eax, 28(%esp)
movl    $desc, 4(%esp)
movl    28(%esp), %eax
movl    %eax, (%esp)
call    sort_gen_asm

My question is why %eax is pushed and then popped? And why movl used instead of pushl and popl respectively? Is it faster? Is there some coding convention I don't yet know? I've just started looking at asm-output closely, so I don't know much.

The C code:

void print_array(int *data, size_t sz);
void sort_gen_asm(array_t*, comparer_t);

int main(int argc, char *argv[]) {
    FILE *file;
    array_t *array;

    file = fopen("test", "rb");
    if (file == NULL) { 
        err(EXIT_FAILURE, NULL);
    }   

    array = array_get(file);
    sort_gen_asm(array, desc); 
    print_array(array->data, array->sz);

    array_destroy(array);
    fclose(file); 

    return 0;
}

It gives this output:

    .file   "main.c"
    .section    .rodata
.LC0:
    .string "rb"
.LC1:
    .string "test"
    .text
    .globl  main
    .type   main, @function
main:
.LFB2:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    movl    $.LC0, 4(%esp)
    movl    $.LC1, (%esp)
    call    fopen
    movl    %eax, 24(%esp)
    cmpl    $0, 24(%esp)
    jne .L2
    movl    $0, 4(%esp)
    movl    $1, (%esp)
    call    err
.L2:
    movl    24(%esp), %eax
    movl    %eax, (%esp)
    call    array_get
    movl    %eax, 28(%esp)
    movl    $desc, 4(%esp)
    movl    28(%esp), %eax
    movl    %eax, (%esp)
    call    sort_gen_asm
    movl    28(%esp), %eax
    movl    4(%eax), %edx
    movl    28(%esp), %eax
    movl    (%eax), %eax
    movl    %edx, 4(%esp)
    movl    %eax, (%esp)
    call    print_array
    movl    28(%esp), %eax
    movl    %eax, (%esp)
    call    array_destroy
    movl    24(%esp), %eax
    movl    %eax, (%esp)
    call    fclose
    movl    $0, %eax
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE2:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu8) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

Unoptimized assembly code can contain quirks like this. It effectively puts some value into a local variable (that lives on the stack) and later reads it out to a register for quick access, ignoring the fact that the register is the same and its value has not changed yet. (Edit: note that this is not *pushing* and *popping*, it's just writing to/reading from a previously reserved stack slot.) — DCoder, Nov 07 '13 at 11:50
@DCoder Thank you! And how can be the absense of the push and pop instructions explained? — staroselskii, Nov 07 '13 at 11:54
_"why %eax is pushed and then popped?"_ It isn't. Pushing and popping involves moving the stack pointer, which those `mov`s don't do. — Michael, Nov 07 '13 at 11:55
Related: [Why does gcc generate verbose assembly code?](https://stackoverflow.com/q/12329626) and its linked duplicates. — Peter Cordes, Nov 20 '21 at 23:02

rodrigo · Accepted Answer · 2013-11-07T12:10:48.250

The save / load of eax is because you did not compile with optimizations. So any read/write of a variable will emit a read/write of a memory address.

Actually, for (almost) any line of code you will be able to identify the exact piece of assembler code resulting from it (let me advise you to compile with gcc -g -c -O0 and then objdump -S file.o):

#array = array_get(file); 
call array_get
movl    %eax, 28(%esp) #write array

#sort_gen_asm(array, desc); 
movl    28(%esp), %eax  #read array
movl    %eax, (%esp)
...

About not pushing/poping, it is a standard zero-cost optimization. Instead of push/pop every time you want to call a function you just substract the maximum needed space to esp at the beginning of the function and then save your function arguments at the bottom of the empty space. There are a lot of advantages: faster code (no changing esp), it doesn't need to compute the argument in any particular order, and the esp will need to be substracted anyway for the local variables space.

score 5 · Answer 2 · answered Nov 07 '13 at 11:54

Some things have to do with calling conventions. Others with optimisations.

sort_gen_asm seems to use cdecl calling convention which requires it's arguments to be pushed onto the stack in reverse order. thus:

movl    $desc, 4(%esp)
movl    %eax, (%esp)

The other moves are partially unoptimised compiler routines:

movl    %eax, 28(%esp) # save contents of %eax on the stack before calling
movl    28(%esp), %eax # retrieve saved 28(%esp) in order to prepare it as an argument
                       # Unoptimised compiler seems to have forgotten that it's
                       # still in the register

Why does GCC produce the following asm output?

2 Answers2