2

I have difficulties understanding the assembly language output created by gcc of a simple C program.

Here's The C-Code of the program:

#include <stdio.h>
#include <stdlib.h>

int sum1=1;
int sum2=1;

int add(int s1, int s2){
    return s1+s2;
}

int main(int argc,char** agrv){
    int res=sum1+sum2;
    return 0;
}

Here's the assembly code created by gcc:

    .file   "main.c"
    .globl  sum1
    .data
    .align 4
sum1:
    .long   1
    .globl  sum2
    .align 4
sum2:
    .long   1
    .text
    .globl  add
    .def    add;    .scl    2;  .type   32; .endef
    .seh_proc   add
add:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movl    %edx, 24(%rbp)
    movl    16(%rbp), %edx
    movl    24(%rbp), %eax
    addl    %edx, %eax
    popq    %rbp
    ret
    .seh_endproc
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $48, %rsp
    .seh_stackalloc 48
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movq    %rdx, 24(%rbp)
    call    __main
    movl    sum1(%rip), %edx
    movl    sum2(%rip), %eax
    addl    %edx, %eax
    movl    %eax, -4(%rbp)
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 7.1.0"

I have difficulties understanding the order of the operands of some of the instructions in the assembly code (see also the memory layout picture for reference Memory Layout). First, there is the instruction

    pushq   %rbp

which pushes the base pointer of the caller onto the stack. After this instruction comes the following instruction:

    movq    %rsp, %rbp

This instruction should set the base pointer of the callee to the value of the current stack pointer. However, shouldn't the order of the two operands be the opposite (e.g. movq %rbp, %rsp)?

A similar "problem" occurs at the instruction:

    addl    %edx, %eax

Here, the result of the operation is stored in the register %edx instead of %eax (which is used to return the function argument).

Pretty much all sources I consulted so far on the Internet claimed that the result of an instruction is stored in the first argument of an instruction?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Mantabit
  • 269
  • 1
  • 4
  • 14
  • 3
    By default GCC outputs AT&T syntax. `-masm=intel` for Intel syntax (I think). The operands are inverted between the two conventions. – Mat Sep 08 '18 at 19:15
  • If you enable optimizations the code will actually be reduced to the `xor eax, eax ret` or in your convention: `xorl %eax, %eax ret` – 0___________ Sep 08 '18 at 19:24
  • @P__J__: gcc will also have to emit a standalone definition of `add`, because it's not `static` or `inline`. Like `lea eax, [rdi+rsi]` / `ret`, because those inputs are function args, not globals. – Peter Cordes Sep 08 '18 at 21:00
  • @PeterCordes yes it will emit to the objext file. But in the executable this function will not be linked so at the end of the day it will end in those two instructions (+all startup, prologue etc etc) – 0___________ Sep 08 '18 at 21:03
  • @P__J__: It won't be *executed*, but it will be present in the final executable. I just tried it, and there's a `0000000000000610 :` in the `objdump` output. (On Arch Linux, gcc7.3 plus standard `ld` from Binutils 2.29). Anyway, the OP is looking at the compiler's asm output, so that's what they'd see. (They could use `__attribute__((noinline))` to see the asm for a function call without the noise of `-O0`. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116)) – Peter Cordes Sep 08 '18 at 21:11
  • @PeterCordes with compiler option -ffunction-sections and linker option --gc-sections ? – 0___________ Sep 08 '18 at 21:18
  • @P__J__: Neat, yeah, `gcc -ffunction-sections -Wl,--gc-sections -O3 foo.c` does omit `add`. I didn't know about that option. It's not on by default, though, and like I said the OP is looking as `gcc -S` output, not disassembly of linked binaries. – Peter Cordes Sep 08 '18 at 21:23
  • @PeterCordes for me is default :). I know that PC programmers do not care too much about this as they have almost unlimited resources (comparing to us embedded ones :) ) – 0___________ Sep 08 '18 at 21:59
  • @P__J__: Sounds useful, too bad it isn't the default when linking an executable. OTOH, careful use of `static` is important for performance to encourage inlining functions into their only call site, and when making a shared library to avoid calling through the PLT and allow inlining. (a "hidden" visibility attribute works too). (Lots of modern non-embedded code goes in shared libs, not executables, and supporting symbol interposition isn't free.) – Peter Cordes Sep 08 '18 at 23:06

1 Answers1

3

The GNU compiler generates assembly in "AT&T syntax" rather then Intel syntax as explained here:

The GNU Assembler, gas, uses a different syntax from what you will likely find in any x86 reference manual, and the two-operand instructions have the source and destinations in the opposite order. Here are the types of the gas instructions:

opcode                    (e.g., pushal)
opcode operand            (e.g., pushl %edx)
opcode source,dest        (e.g., movl %edx,%eax) (e.g., addl %edx,%eax)

Where there are two operands, the rightmost one is the destination. The leftmost one is the source.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • @Denis There are some instructions that differ, such as `pushal` in AT&T syntax instead of `pushad` in Intel syntax, and `if (eax >= ecx) goto foo;` is `cmpl %ecx, %eax` / `jge foo` in AT&T because [the operands for the source and destination are reversed in almost all cases in AT&T syntax](https://sourceware.org/binutils/docs-2.31/as/i386_002dVariations.html) (second bullet point). –  Sep 08 '18 at 21:39