3

I know some C and a little bit of assembly and wanted to start learning about reverse engineering, so I downloaded the trial of Hopper Disassembler for Mac. I created a super basic C program:

int main() {
    int a = 5;
    return 0;
}

And compiled it with the -g flag (because I saw this before and wasn't sure if it mattered):

gcc -g simple.c

Then I opened the a.out file in Hopper Disassembler and clicked on the Pseudo Code button and it gave me:

int _main() {
    rax = 0x0;
    var_4 = 0x0;
    var_8 = 0x5;
    rsp = rsp + 0x8;
    rbp = stack[2047];
    return 0x0;
}

The only line I sort of understand here is setting a variable to 0x5. I'm unable to comprehend what all these additional lines are for (such as the rsp = rsp + 0x8;), for such a simple program. Would anyone be willing to explain this to me?

Also if anyone knows of good sources/tutorials for an intro into reverse engineering that'd be very helpful as well. Thanks.

Austin
  • 6,921
  • 12
  • 73
  • 138
  • 1
  • Oh, I forgot about optimizations. I turn that off with some other compiler flag right? Just as a general question, do programs that were compiled with optimizations become very difficult to reverse engineer? – Austin Jun 16 '16 at 18:04
  • 1
    Have you read the ABI for your system? Have you checked what rax, rsp and rbp is? May this is worth reading https://cs.nyu.edu/courses/fall11/CSCI-GA.2130-001/x64-intro.pdf – Support Ukraine Jun 16 '16 at 18:07
  • Never heard of ABI before, I just looked it up but still a bit confused. I recompiled using `-O0` for no optimizations and got the same exact pseudo code in Hopper. I guess I'll read some more about ABI and what the particular registers are. – Austin Jun 16 '16 at 18:11
  • 1
    If you will turn optimizations ON, you will get empty stub, as the compiler will realize you are not doing anything in that code (that variable is unused). So using -O0 and -g is actually more sane in your situation. – Ped7g Jun 16 '16 at 18:13
  • 1
    If you want to understand the assembly code, you need to read the ABI (Application binary interface) for your system. It tells you the low details about register usage, stack usage, function calls, parameter passing and so on. The assembly code will (probably) not make sense unless you know the ABI – Support Ukraine Jun 16 '16 at 18:15
  • 1
    And particular registers are general CPU registers. Although some of them are often reused in similar scenario, like `rbp` is notoriously known for providing stack frame for high-level languages, and `rsp` insist on working together with `push`/`pop` instructions, but generally you can bend them quite far away from those pesky stereotypes, as in the heart they **are** general CPU registers (in x64). I'm afraid your "little bit of assembly" may be too little for deciphering the C compiler output, maybe start with pure human-written assembler first. – Ped7g Jun 16 '16 at 18:17
  • Is ABI specific to the operating system, CPU, both? I took an intro class in computer organization where we translated some C code to MIPS assembly and then learned about the CPU pipeline, but I'm finding that understanding that is a whole lot easier/more straightforward than understanding the assembly code the compiler actually creates. Also is `rsp + 0x8` just moving the stack pointer to the next byte? – Austin Jun 16 '16 at 18:23
  • 2
    ABI is specific to operating system and CPU for sure. Actually even more specific, if you consider for example 32b MS Windows and 64b MS Windows to be the same operating system (only in different variant), but they have completely different ABI and native binaries (32b binaries are run in 64b win in some sort of wrapper layer, which makes them run in 32b environment). Problem with asm produced by compiler is, that if feels very cryptic, when compared to human written ASM (which was intentionally written in clean manner). It's like another difficulty level, after you managed "human" ASM. – Ped7g Jun 16 '16 at 18:29
  • Thanks for the explanations. Yes it seems like a very big jump up in difficulty from human written assembly, but I guess I'll have to take it a step at a time. – Austin Jun 16 '16 at 18:31
  • 1
    for example, I have no idea why that `rsp + 8` is done... Yeah, it's like popping 64b from stack. But I don't see what good is it for, and the ebp change as well. Actually that disassembled piece look a bit incomplete. Maybe stop using disassembler, and just instruct your compiler to produce .s files to see the real assembler instructions of C compiler. (`objdump` is nice too, but the direct .s output can include source lines in comments, check the gcc compiling options). – Ped7g Jun 16 '16 at 18:32
  • 1
    http://stackoverflow.com/a/19083877/4271923 – Ped7g Jun 16 '16 at 18:45
  • @Ped7g :I venture to guess (I have never used Hopper) that var_4 is padding and var_8 is `a` with the value 5. I assume `var_` represent the stack based variables. `rsp = rsp + 0x8;` would be releasing the local stack variables. – Michael Petch Jun 17 '16 at 03:21
  • 1
    @MichaelPetch: I can't decipher what var_4 is supposed to be and where it comes from, but rsp+=8 and rbp=xxx is `pop rbp`. And I believe the epilogue of method ended up like this because the compiler realized it didn't touch the `rsp`, so it did omit the `mov %rbp,%rsp`, which is usually the first step of epilogue. And the "pseudo code" parser is not aware of this shortened epilogue variant. Probably.. just guessing. Anyway, after this I wouldn't touch that "pseudo code" functionality with ten foot pole, looks like piece of crap (maybe as additional info next to real asm it would be fine). – Ped7g Jun 17 '16 at 10:13
  • I decided to load up one of my Macs to see what code it generated with gcc. One can see where it probably got var_4 from (assuming the unoptimized code the OP got was similar to what I see) : `mov DWORD PTR [rbp-0x4],0x0` `mov DWORD PTR [rbp-0x8],0x5` – Michael Petch Jun 17 '16 at 10:38

2 Answers2

5

Looks like it is doing a particularly poor job of producing "disassembly pseudocode" (whatever that is -- is it a disassembler or a decompliler? Can't decide)

In this case it looks like it has has elided the stack frame setup (the function prolog), but not the cleanup (function epilog). So you'll get a much better idea of what is going on by using an actual disassembler to look at the actual disassembly code:

$ gcc -c simple.c
$ objdump -d simple.o

simple.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   c7 45 fc 05 00 00 00    movl   $0x5,-0x4(%rbp)
   b:   b8 00 00 00 00          mov    $0x0,%eax
  10:   5d                      pop    %rbp
  11:   c3                      retq   

So what we have here is code to set up a stack frame (address 0-1), the assignment you have (4), setting up the return value (b), tearing down the frame (10) and then returning (11). You might see something different due to using a different version of gcc or a different target.

In the case of your disassembly, the first part has been elided (left out as being an uninteresting housekeeping task) by the disassembler, but the second to last part (which undoes the first part) has not.

CODE-REaD
  • 2,819
  • 3
  • 33
  • 60
Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
1

What you're looking at is decompiled code. Every decompiler ouptutwill look something close to that because it's not going to try and get variable names because they can be changed so often and usually are.

So it will put them in a 'var_??' with a number attached to the end. Once you learn about reverse engineering and know the language you're programming in very well, you can understand the code. It's no different when you're trying to de-obfuscate PHP, JavaScript code, etc.

If you ever get into reverse engineering malware be prepared because nothing is going to be easy. You're going to have different packers, obfuscators, messed-up code, VM detection routines, etc. So buckle down and get ready for a long road ahead if reverse engineering is your goal.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83