0

I have a collection of binaries in the format of ELF. I am trying to retrieve the source code and compare it with the assembly so that I could have better understanding about the compiling process.

Now I could de-assemble the ELF based on the great solution here, which uses the following command

objdump -dj .text <binary>

and outputs something that looks like

00000000004035ed <version_etc>:
  4035ed:   48 81 ec d8 00 00 00    sub    $0xd8,%rsp
  4035f4:   4c 89 44 24 40          mov    %r8,0x40(%rsp)
  4035f9:   4c 89 4c 24 48          mov    %r9,0x48(%rsp)
  4035fe:   84 c0                   test   %al,%al
  403600:   74 37                   je     403639 <version_etc+0x4c>
  403602:   0f 29 44 24 50          movaps %xmm0,0x50(%rsp)
  403607:   0f 29 4c 24 60          movaps %xmm1,0x60(%rsp)
  40360c:   0f 29 54 24 70          movaps %xmm2,0x70(%rsp)
  403611:   0f 29 9c 24 80 00 00    movaps %xmm3,0x80(%rsp)
...

So I will try to retrieve version_etc.c file in the source tree and compare it with this assembly snippet.

I know there are some correspondence between the first two columns and the last two columns. However, I am not quite interested in those first two columns.

I am wondering if there is any tool that could help me extract the last two columns as a string and pair it with the header (in the example, version_etc).

I know I could simply write a script that uses regular expressions or similar to do this, but that would be error-prone (corner cases, etc.), it would be great if I could use some more principled way to do this extraction.

Mr.Robot
  • 349
  • 1
  • 16
  • Are you hoping to get asm source you could re-assemble? Use Agner Fog's `objconv` disassembler (http://agner.org/optimize/#objconv) which directly does that in the first place, including using labels for branch targets. [How to disassemble, modify and then reassemble a Linux executable?](https://stackoverflow.com/posts/comments/58356507) – Peter Cordes Feb 01 '21 at 05:55
  • @PeterCordes Thanks for the reply! Actually I am trying to retrieve the source code (in the example, the source should be named `version_etc.c`) and compare it with the assembly so that I could have better understanding about the compiling process. So ideally, I would have a Python dict that looks like `filename: (source, binary)` (I will add this to the problem description). – Mr.Robot Feb 01 '21 at 06:02
  • Oh, if you do already have the source, you can just get the compiler asm output directly without going through binary, so you still have labels and whatnot: [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). Especially https://godbolt.org/ is nice for matching up source with asm. – Peter Cordes Feb 01 '21 at 06:12
  • Other options: for a binary with debug info, objdump or GDB can do that for you. [How to disassemble one single function using objdump?](https://stackoverflow.com/a/22775364) / [Using GCC to produce readable assembly?](https://stackoverflow.com/q/1289881) , or have the compiler output mixed asm and source without actually compiling to binary - [How can I see the assembly code for a C++ program?](https://stackoverflow.com/q/840321) – Peter Cordes Feb 01 '21 at 06:12

0 Answers0