2

I'm trying to learn how to understand assembly code so I've been studying the assembly output of GCC for some stupid programs. One of them was nothing but int i = 0;, the code of which I more or less fully understand now (the biggest struggle was understanding the GAS directives strewn about). Anyway, I went a step forward and added printf("%d\n", i); to see if I could understand that and suddenly the code is much more chaotic.

    .file   "helloworld.c"
    .text
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "%d\n"
    .section    .text.startup,"ax",@progbits
    .p2align 4
    .globl  main
    .type   main, @function
main:
    subq    $8, %rsp
    xorl    %edx, %edx
    leaq    .LC0(%rip), %rsi
    xorl    %eax, %eax
    movl    $1, %edi
    call    __printf_chk@PLT
    xorl    %eax, %eax
    addq    $8, %rsp
    ret
    .size   main, .-main
    .ident  "GCC: (Gentoo 10.2.0-r3 p4) 10.2.0"
    .section    .note.GNU-stack,"",@progbits

I'm compiling this with gcc -S -O3 -fno-asynchronous-unwind-tables to remove the .cfi directives, however -O2 produces the same code so -O3 is overkill. My understanding of assembly is quite limited but it seems to me like the compiler is doing a lot of unneccessary stuff here. Why subtract and then add 8 to rsp? Why is it performing so many xors? There's only one variable. What is movl $1, %edi doing? I thought maybe the compiler was doing something stupid in an attempt to optimize but as I said, it's not optimizing beyond -O2, also it performs all of these operations even at -O1. To be honest I don't understand the unoptimized code at all so I assume it's inefficient.

The only thing that comes to mind is that the call to printf uses these registers, otherwise they are unused and serve no purpose. Is that actually the case? If so, how is it possible to tell?

Thanks in advance. I'm reading a book on compiler design at the moment and I've read most of the GCC manual (I read the whole chapter on optimization) and I've read some introductory x86_64 asm material, if somebody could point me toward some other resources (besides the Intel x86 manual) for learning more I would also appreciate that.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Bridge
  • 105
  • 1
  • 8
  • This assembly doesn't seem to match the code you supposedly compiled. This is what I'd expect to see if you called `printf(1, "%d\n")`, and judging from godbolt, this is in fact very close to the assembly you get when calling `printf(1, "%d\n")`. – Aplet123 Nov 14 '20 at 16:06
  • @Aplet123 This is the code: ```#include main() { int i = 0; printf("%d\n", i); }``` – Bridge Nov 14 '20 at 16:35
  • 1
    See https://stackoverflow.com/tags/x86/info for lots of useful links. The [ABI](https://stackoverflow.com/questions/18133812/where-is-the-x86-64-system-v-abi-documented) is likely to be of particular interest. – Nate Eldredge Nov 14 '20 at 17:34

1 Answers1

2

For the compiler that you are using it looks like printf(...) is mapped to __printf_chk(1, ...)

To understand the code, you need to understand the parameter passing conventions for the platform (part of the ABI). Once you know that up to 4 params are passed in %rdi, %rsi, %rdx, %rcx, you can understand most of what is going on:

subq    $8, %rsp             ; allocate 8 bytes of stack
xorl    %edx, %edx           ; i = 0 ; put it in the 3rd parameter for __printf_chk
leaq    .LC0(%rip), %rsi     ; 2nd parameter for __printf_chk.  The: "%d\n"
xorl    %eax, %eax           ; 0 variadic fp params
movl    $1, %edi             ; 1st parameter for __printf_chk
call    __printf_chk@PLT     ; call the runtime loader wrapper for __printf_chk
xorl    %eax, %eax           ; return 0 from main
addq    $8, %rsp             ; deallocate 8 bytes of stack.
ret

Nate points out in the comments that section 3.5.7 in the ABI explains the %eax = 0 (no floating point variadic parameters.)

Peeter Joot
  • 7,848
  • 7
  • 48
  • 82
  • I see! Reading up on Linux x86 calling conventions was on my reading list, I guess it was more important than I realized. Re: __printf_chk, the compiler calls printf() directly at ```-O0``` but uses ```__printf_chk``` when optimizations are turned on. Didn't occur to me that the other 0 could be the return value, my main() function has an imlicit return value and in the case of ```main() { int i = 0; }``` there was no return value, only a return statement. Now it makes sense. Thanks for your help. – Bridge Nov 14 '20 at 16:45
  • 3
    For a variadic function like `printf`, `%al` is required to be set to the number of vector (`%xmm`) registers being used to pass floating-point parameters. See 3.5.7 of the [ABI](https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x86-64-psABI-1.0.pdf) Here there are no floating-point parameters being passed, so `%al` must be zero. The compiler zeroes out all of `%eax` (which in fact zeros all of `%rax`) because why not (maybe to avoid some partial register dependencies). – Nate Eldredge Nov 14 '20 at 17:30
  • Also, the `sub $8, %rsp` isn't really to allocate 8 bytes of stack per se (the function doesn't use that space for anything), but rather to ensure [stack alignment](https://stackoverflow.com/questions/49391001/why-does-the-x86-64-amd64-system-v-abi-mandate-a-16-byte-stack-alignment) as required by the ABI (see 3.2.2). – Nate Eldredge Nov 14 '20 at 17:32
  • @NateEldredge, isn't 8 byte stack alignment required? If so, subtracting another 8 bytes doesn't do anything for alignment. Isn't there implicit usage of this stack storage by __printf_chk (if it calls anything, isn't the return address saved in the stack storage allocated by the caller?) – Peeter Joot Nov 14 '20 at 21:13
  • No, the required alignment is 16 bytes. The stack was aligned to 16 bytes before `main` was called, the `call` instruction to transfer control to `main` pushed 8 bytes, so we must subtract a further 8 to get back to 16-byte alignment for the call to `__printf_chk`. The return address is pushed into the stack slot **below** rsp by the call instruction. The 8 bytes "allocated" by the `sub $8, %rsp` are really never read or written by anybody at all. Step through the code if you like and you'll see. – Nate Eldredge Nov 14 '20 at 21:17
  • 1
    AFAIK the only time when caller's stack is used "implicitly" by a callee is if function arguments are passed on the stack; then the callee can modify them in place. That is not the case here since we have fewer than 6 arguments, so they are all passed in registers. – Nate Eldredge Nov 14 '20 at 21:21