0

Let's say I run objdump -d on an object file generated by a C compiler and I get this disassembly:

0000000000400b5e <main>:
 400b5e: 55 push %rbp
 400b5f: 48 89 e5 mov %rsp,%rbp
 400b62: bf 50 0a 49 00 mov $0x490a50,%edi
 400b67: e8 04 0b 00 00 callq 401670 <_IO_puts>
 400b6c: 5d pop %rbp
 400b6d: c3 retq 
 400b6e: 66 90 xchg %ax,%ax

I'm not sure how to interpret everything here. Take the line:

400b62: bf 50 0a 49 00 mov $0x490a50,%edi

I get what the mov statement is doing, but what does the 400b62 mean? What does the bf 50 0a 49 00 mean? I couldn't find anything on the Internet explaining how to read this stuff.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
SuperNova
  • 111
  • 1
  • 11
  • `bf 50 0a 49 00` is the machine code for `mov $0x490a50,%edi`. Usually, the mnemonics are all aligned to a common column so this is easy to see. – fuz Oct 13 '17 at 22:58
  • You're better off using the [compiler option that creates an assembly file](https://stackoverflow.com/questions/137038/how-do-you-get-assembler-output-from-c-c-source-in-gcc). It will annotate the assembly code with the corresponding C statements, variables, and literals. – Barmar Oct 13 '17 at 23:29

2 Answers2

4

The 400b62 is the address of the instruction. The bf 50 0a 49 00 are the bytes that make up the instruction. In this case, the instruction at 400b62 it sets the register %edi to 0x490a50. Here bf means "set edi" and 50 0a 49 00 are the bytes for 0x490a50 in little endian order (as Intel processors do).

If you want to be able to read each of the instructions, it takes a bit of decoding but can be done. The best reference in my experience is the Intel® 64 and IA-32 Architectures Software Developer Manuals, but they are not for the faint of heart.

DocMax
  • 12,094
  • 7
  • 44
  • 44
  • This is probably a stupid question, but what does it mean "the bytes that make up the instruction"? – SuperNova Oct 13 '17 at 22:22
  • @KingHenryV, see my expanded answer. – DocMax Oct 13 '17 at 22:26
  • 2
    @KingHenryV the CPU doesn't execute the text source code, it understands only instruction opcodes, the x86 CPU understands `bf` as `mov edi,imm32` instruction, so it will read also next four bytes to fetch the `imm32` data. Each CPU from different vendor has different instructions and opcodes, that's why x86 Assembly is different from ARM Assembly. To execute your source code, you must first compile it with assembler into those instruction opcodes (machine code), then the CPU can execute your program. – Ped7g Oct 13 '17 at 22:52
2

Start with an assembly language primer, such as https://speakerdeck.com/vsergeev/x86-assembly-primer-for-c-programmers which is good if you already know C.

In your code, what matters is

mov $0x490a50,%edi
callq 401670 <_IO_puts>

First 6 arguments on x86-64 are passed through registers rdi, rsi, rdx, rcx r8, r9. (edi is half of rdi), so this passes one argument to a to-be called function and then calls the function.

The decoded name (_IO_puts) suggests you're dealing with an implementation of puts, which implies 0x490a50 is the hexadecimal representation of the memory address of a string that was passed to it.

The original main will likely be something like:

#include <stdio.h>
int main() { puts("hello world"); }
Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 1
    `0x490a50` is memory address (`const char*`), under "hexadecimal representation of a string" I would imagine rather eight `char` values stored directly in the `rdi`, not in memory, which is not the case for `puts`. – Ped7g Oct 13 '17 at 22:48
  • 1
    @Ped7g My head automatically decays strings to pointers. C has spoiled me. Anyway, I have fixed the text of the answer now. – Petr Skocik Oct 13 '17 at 23:04