3

I'm a newbie to C and GCC compilers and trying to study how C is compiled into machine code by disassembling binaries produced, but the result of compiling and then disassembling a very simple function seems overcomplicated.

I have basic.c file:

int my_function(){
    int a = 0xbaba;
    int b = 0xffaa;
    return a + b;
}

Then I compile it using gcc -ffreestanding -c basic.c -o basic.o

And when I dissasemble basic.o object file I get quite an expected output:

0000000000000000 <my_function>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   c7 45 fc ba ba 00 00    movl   $0xbaba,-0x4(%rbp)
   b:   c7 45 f8 aa ff 00 00    movl   $0xffaa,-0x8(%rbp)
  12:   8b 55 fc                mov    -0x4(%rbp),%edx
  15:   8b 45 f8                mov    -0x8(%rbp),%eax
  18:   01 d0                   add    %edx,%eax
  1a:   5d                      pop    %rbp
  1b:   c3                      retq 

Looks great. But then I use linker to produce raw binary: ld -o basic.bin -Ttext 0x0 --oformat binary basic.o

So after disassembling this basic.bin file with command ndisasm -b 32 basic.bin > basic.dis, I get something interesting here:

00000000  55                push ebp
00000001  48                dec eax
00000002  89E5              mov ebp,esp
00000004  C745FCBABA0000    mov dword [ebp-0x4],0xbaba
0000000B  C745F8AAFF0000    mov dword [ebp-0x8],0xffaa
00000012  8B55FC            mov edx,[ebp-0x4]
00000015  8B45F8            mov eax,[ebp-0x8]
00000018  01D0              add eax,edx
0000001A  5D                pop ebp
0000001B  C3                ret
0000001C  0000              add [eax],al
0000001E  0000              add [eax],al
00000020  1400              adc al,0x0
00000022  0000              add [eax],al
00000024  0000              add [eax],al
00000026  0000              add [eax],al
00000028  017A52            add [edx+0x52],edi
0000002B  0001              add [ecx],al
0000002D  7810              js 0x3f
0000002F  011B              add [ebx],ebx
00000031  0C07              or al,0x7
00000033  08900100001C      or [eax+0x1c000001],dl
00000039  0000              add [eax],al
0000003B  001C00            add [eax+eax],bl
0000003E  0000              add [eax],al
00000040  C0FFFF            sar bh,byte 0xff
00000043  FF1C00            call far [eax+eax]
00000046  0000              add [eax],al
00000048  00410E            add [ecx+0xe],al
0000004B  108602430D06      adc [esi+0x60d4302],al
00000051  57                push edi
00000052  0C07              or al,0x7
00000054  0800              or [eax],al
00000056  0000              add [eax],al

I don't really know where the commands like SAR, JS, DEC come from and why they are required. I guess, that's because I specify invalid arguments for compiler or linker.

JohnIdlewood
  • 356
  • 3
  • 12
  • 3
    They are not commands (instructions) they are data you disassembled as instructions. They are not required, you presumably have other sections than `.text` in your object file. – Jester Jul 07 '20 at 12:23
  • there are differences between the 2 disassembles, for instance for the 4 first bytes, the first is given by `gcc -S` ? – bruno Jul 07 '20 at 12:26
  • 1
    Use `objdump -D` to print out the sections. But likely what your are seeing are the data associated with `.eh_frame` section. Those sections are just data but ndiasm is decoding everything as instructions because the binary format doesn't make a distinction between what is actually code and data so everything by default gets decoded as instructions. – Michael Petch Jul 07 '20 at 12:26
  • You disassembled the original function, and then you disassembled tons of more code. – gnasher729 Jul 07 '20 at 12:30
  • @MichaelPetch, yes, I see, there're different sections, thank you – JohnIdlewood Jul 07 '20 at 12:31
  • @Jester, yes, there're .comment and eh_frame sections – JohnIdlewood Jul 07 '20 at 12:31
  • 2
    If you remove the `.eh_frame` section or don't generate them at all then you should see what you want. Try adding the `-fno-asynchronous-unwind-tables` option to the GCC command line. Comments won't go into a binary file but the `.eh_frame` will. You generated 64-bit code so you need to disassemble with `-b64` to get the decoding you want. – Michael Petch Jul 07 '20 at 12:32
  • 2
    Also, you compiled to 64-bit machine code but then you disassembled it as if it was 32-bit. This is why `mov rbp, rsp` became `dec eax; mov ebp, esp` for instance. – zwol Jul 07 '20 at 12:32
  • @MichaelPetch, so, I guess, they are included into the .bin file too, but how does processor knows that this comments shouldn't be executed - I see only one jump at address 0000002D and it happens after some instructions are actually processed – JohnIdlewood Jul 07 '20 at 12:35
  • 1
    Well your function ends at the `ret` so never executes the other data. All the stuff below the `ret` that ended the function doesn't get executed. It is just data. The `.comment` section are in the ELF file (object) but aren't marked allocatable so when the binary file is generated they are excluded. The `.eh_frame` section is allocatable so they appear in the binary files. – Michael Petch Jul 07 '20 at 12:37
  • @MichaelPetch, aaaahhh, sorry, I've missed ret command. Great thanks, you've made my day - the fno-asynchronouse-unwind-tables command worked, I'll study what it all means – JohnIdlewood Jul 07 '20 at 12:40

1 Answers1

4

As I concluded from @Michael Petch comments:

The binary representation of required function is represented by 00000000-0000001B lines of code snippet of the disassembled file and executes command ret at the end so the second part of the file (0000001B-00000056) is never executed - it's metadata.

As per @Michael Petch and @Jester comments:

I could figure out that the object file consists of many sections https://en.wikipedia.org/wiki/Object_file The generated basic.o file originally had three sections:

  • .text (function itself)
  • .comment (not represented in the binary file)
  • .eh_frame

What is .eh_frame section and why GCC compiler creates it, is described here: Why GCC compiled C program needs .eh_frame section?

By running gcc with argument -fno-asynchronous-unwind-tables I could get rid of .eh_frame section from object file.

JohnIdlewood
  • 356
  • 3
  • 12
  • 1
    If you just want NASM-syntax disassembly and/or a disassembler other than GNU binutils to double-check in case of binutils bugs, Agner Fog's `objconv` can do that. (And unlike `ndisasm`, it understands `.o` object file formats so you don't need to dump a section to a flat binary first.) [How to disassemble a binary executable in Linux to get the assembly code?](https://stackoverflow.com/a/33978857). – Peter Cordes Jul 07 '20 at 19:39