2

Im pretty new to assembly, and am trying my best to learn it. Im taking a course to learn it and they mentioned a very remedial Hello World example, that I decomplied.

original c file:

#include <stdio.h>
int main()
{
printf("Hello Students!");
return 0;
}

This was decompiled using the following command:

C:> objdump -d -Mintel HelloStudents.exe > disasm.txt

decompliation (assembly):

push ebp
mov  ebp, esp
and  esp, 0xfffffff0
sub esp, 0x10
call 401e80 <__main>
mov DWORD PTR [esp], 0x404000
call 4025f8 <_puts>
mov eax, 0x0
leave
ret

Im having issues mapping this output from the decompliation, to the original C file can someone help?

Thank you very much!

Jshee
  • 2,620
  • 6
  • 44
  • 60
  • What do you need help with? – Acorn May 17 '20 at 14:26
  • im stuck in mapping the `decompilation` to the original `c` code – Jshee May 17 '20 at 14:27
  • I am afraid that is too broad for StackOverflow. What is what you do not understand from the disassembly? Doesn't the course explain how to read assembly? – Acorn May 17 '20 at 14:29
  • Where in the actual disassembly is the characters `H E L L O S T U D E N TS` ? – Jshee May 17 '20 at 14:31
  • 4
    Please don't post images - copy and paste the text into the body of your question. – John Bode May 17 '20 at 15:08
  • 2
    The string literal isn't in the `.text` section. `objdump -d` only disassembles `.text`, it doesn't dump contents of other sections. – Peter Cordes May 17 '20 at 15:10
  • 1
    Somewhat related: [What parts of this HelloWorld assembly code are essential if I were to write the program in assembly?](https://stackoverflow.com/q/39550402) compiles to asm instead of disassembling, making it easier to see the important parts. See also [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) – Peter Cordes May 17 '20 at 15:12
  • 1
    @Jshee Use `objdump -s` to dump the other sections, including the section where the string is. `-d` only dumps machine code, not data. – fuz May 17 '20 at 15:33
  • And please edit your question to contain the `objdump` output as text instead of an image. Will upvote your question once you've done so. – fuz May 17 '20 at 15:34
  • @fuz - please see above. thanks – Jshee May 17 '20 at 18:14
  • @Jshee Note lastly for clarification: what you have there is a *disassembly,* not a *decompilation.* Decompiling code is a lot more involved than just disassembling it. – fuz May 17 '20 at 18:35

1 Answers1

7

The technical term for decompiling assembly back into C is "turning hamburger back into cows". The generated assembly will not be a 1-to-1 translation of the source, and depending on the level of optimization may be radically different. You will get something functionally equivalent to the original source, but how closely it resembles that source in structure is heavily variable.

push ebp
mov ebp, esp
and esp, 0xfffffff0
sub esp, 0x10

This is all preamble, setting up the stack frame for the main function. It aligns the stack pointer (ESP) by 16 bytes then reserves another 16 bytes of space for outgoing function args.

call 401e80, <___main>

This function call to ___main is how MinGW arranges for libc initialization functions to run at the start of the program, making sure stdio buffers are allocated and stuff like that.


That's the end of the pre-amble; the part of the function that implements the C statements in your source starts with:

mov DWORD PTR [esp], 0x404000

This writes the address of the string literal "Hello Students!" onto the stack. Combined with the earliersub esp, 16, this is like apush` instruction. In this 32-bit calling convention, function args are passed on the stack, not registers, so that's where the compiler has to put them before function calls.

call 4025f8 <_puts>

This calls the puts function. The compiler realized that you weren't doing any format processing in the printf call and replaced it with the simpler puts call.

mov eax, 0x0

The return value of main is loaded into the eax register

leave
ret

Restore the previous EBP value, and tear down the stack frame, then exit the function. ret pops a return address off the stack, which can only work when ESP is pointing at the return address.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • 3
    *This loads the address of the string literal "Hello Students!" into the esp register* - No, it *stores* that address to the top of the stack! It doesn't modify ESP, only the memory pointed to by ESP. GCC already moved the stack pointer to make space so it's doing this instead of `push 0x404000` / `call` to pass the address as a stack arg. ESP is the stack pointer. – Peter Cordes May 17 '20 at 15:27
  • Thanks John - very very helpful to a novice at this right now. And @Peter - good point for sure. – Jshee May 17 '20 at 15:30
  • 1
    @PeterCordes: =*sigh*= - you're right. My eyes were seeing `esp` but my brain was interpreting it as something else. I've fixed it. – John Bode May 17 '20 at 15:33
  • plus one for hamburgers into cows – old_timer May 17 '20 at 22:20