-3

I have been trying to compile this C program to assembly but it hasn't been working fine.

I am reading Dennis Yurichev Reverse Engineering for Beginner but I am not getting the same output. Its a simple hello world statement. I am trying to get the 32 bit output

#include <stdio.h>
int main()
{
     printf("hello, world\n");
     return 0;
}

Here is what the book says the output should be

   main proc near
    var_10 = dword ptr -10h
    push ebp
    mov ebp, esp
    and esp, 0FFFFFFF0h
    sub esp, 10h
    mov eax, offset aHelloWorld ; "hello, world\n"
    mov [esp+10h+var_10], eax
    call _printf
    mov eax, 0
    leave
    retn
    main endp

Here are the steps;

  1. Compile the print statement as a 32bit (I am currently running a 64bit pc)

    • gcc -m32 hello_world.c -o hello_world
  2. Use gdb to disassemble

    • gdb file
    • set disassembly-flavor intel
    • set architecture i386:intel disassemble main

And i get;


    lea    ecx,[esp+0x4]
    and    esp,0xfffffff0
    push   DWORD PTR [ecx-0x4]
    push   ebp
    mov    ebp,esp
    push   ebx
    push   ecx
    call   0x565561d5 <__x86.get_pc_thunk.ax>
    add    eax,0x2e53
    sub    esp,0xc
    lea    edx,[eax-0x1ff8]
    push   edx
    mov    ebx,eax
    call   0x56556030 <puts@plt>
    add    esp,0x10
    mov    eax,0x0
    lea    esp,[ebp-0x8]
    pop    ecx
    pop    ebx
    pop    ebp
    lea    esp,[ecx-0x4]
    ret 

I have also used

objdump -D -M i386,intel hello_world> hello_world.txt

ndisasm -b32 hello_world > hello_world.txt

But none of those are working either. I just cant figure out what's wrong. I need some help. Looking at you Peter Cordes ^^

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
opobtdfs
  • 5
  • 2
  • 4
    Why do you expect your compiler to produce identical output to somebody else's (different) compiler? – EOF Sep 28 '20 at 21:32
  • @EOF actually gcc gives exactly that code – 0___________ Sep 28 '20 at 21:42
  • 3
    Are you using the same version of the same compiler, with the same optimization flags, on the same operating system as the author? If anything is different, even the compiler version, then your results will not be an exact match. – John Bode Sep 28 '20 at 22:02
  • 1
    Compilers create functionally equivalent code in the output language from the input language. There is no reason to expect any two or the same with different options (or as with gcc how it was built even if the same version) to produce the exact same output. Just read the book and go with whatever is in the book for the purposes of understanding the book. – old_timer Sep 29 '20 at 01:17
  • thanx for the response. I understand that there will be some differences because the compilers are different but i didn't expect it to be that much. Way more instructions were added. I also totally forgot about optimization. I was still expecting to at least see the string but nothing.. – opobtdfs Sep 29 '20 at 12:45

3 Answers3

3

The output from the book looks like MSVC, not GCC. GCC will definitely not ever emit main proc because that's MASM syntax, not valid GAS syntax. And it won't do stuff like var_10 = dword ptr -10h.
(And even if it did, you wouldn't see assemble-time constant definitions in disassembly, only in the compiler's asm output which is what the book suggested you look at. gcc -S -masm=intel output. How to remove "noise" from GCC/clang assembly output?)

So there are lots of differences because you're using a different compiler. Even modern versions of MSVC (on the Godbolt compiler explorer) make somewhat different asm, for example not bothering to align ESP by 16, perhaps because more modern Windows versions, or CRT startup code, already does that?


Also, your GCC is making PIE executables by default, so use -fno-pie -no-pie. 32-bit PIE sucks for efficiency and for ease of understanding. See How do i get rid of call __x86.get_pc_thunk.ax. (Also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables, mostly focused on 64-bit code)

The extra clunky stack-alignment in main's prologue is something that GCC8 optimized for functions that don't also need alloca. But it seems even current GCC10 emits the full un-optimized version when you don't enable optimization :(. Why is gcc generating an extra return address? and Trying to understand gcc's complicated stack-alignment at the top of main that copies the return address

Optimizing printf to puts: see How to get the gcc compiler to not optimize a standard library function call like printf? and -O2 optimizes printf("%s\n", str) to puts(str). gcc -fno-builtin-printf would be one way to make that not happen, or just get used to it. GCC does a few optimizations even at -O0 that other compilers only do at higher optimization levels.


MSVC 19.10 compiles your function like this (on the Godbolt compiler explorer) with optimization disabled (the default, no compiler options).

_main   PROC
        push    ebp
        mov     ebp, esp
        push    OFFSET $SG4501
        call    _printf
        add     esp, 4
        xor     eax, eax
        pop     ebp
        ret     0
_main   ENDP

_DATA   SEGMENT
$SG4501 DB        'hello, world', 0aH, 00H

GCC10.2 still uses an over-complicated stack alignment dance in the prologue.

.LC0:
        .string "hello, world"
main:
        lea     ecx, [esp+4]
        and     esp, -16
        push    DWORD PTR [ecx-4]
        push    ebp
        mov     ebp, esp
        push    ecx
        sub     esp, 4
# end of function prologue, I think.
        sub     esp, 12                  # make sure arg will be 16-byte aligned
        push    OFFSET FLAT:.LC0         # push a pointer
        call    puts
        add     esp, 16                  # pop the arg-passing space
        mov     eax, 0                   # return 0

        mov     ecx, DWORD PTR [ebp-4]   # undo stack alignment.
        leave
        lea     esp, [ecx-4]
        ret

Yes, this is super inefficient. If you called your function anything other than main, it would already assume ESP was aligned by 16 on function entry:

# GCC10.2 -m32 -O0
.LC0:
        .string "hello, world"
foo:
        push    ebp
        mov     ebp, esp
        sub     esp, 8            # reach a 16-byte boundary, assuming ESP%16 = 12 on entry
#
        sub     esp, 12                   
        push    OFFSET FLAT:.LC0
        call    puts
        add     esp, 16
        mov     eax, 0
        leave
        ret

So it still doesn't combine the two sub instructions, but you did tell it not to optimize so braindead code is expected. See Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? for example.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • First i need to mention that i am a noob when it comes to compilers and less of a noob when it comes to assembly. In the book there are 32bit and 64 bit outputs for both gcc and msvc. I believe he used gcc -S -masm=intel for the 32 bit output. I didn't try to use that because i am more familiar with nasm than masm. You are saying gcc is adding additional noise to file. – opobtdfs Sep 29 '20 at 13:47
  • you are a genius sir. You can close this question – opobtdfs Oct 19 '20 at 16:57
  • @opobtdfs: You can do that by clicking the "accept" checkbox under the up/down vote arrows on one of the answers. – Peter Cordes Oct 19 '20 at 20:08
1

My GCC will very eagerly swap a call to printf to puts! I did not manage to find the command line options that would make the compiler to not do this. I.e. the program has the same external behaviour but the machine code is that of

#include <stdio.h>
int main(void)
{
     puts("hello, world");
}

Thus, you'll have really hard time trying to get the exact same assembly as in the book, as the assembly from that book has a call to printf instead of puts!

0

First of all you compile not decompile.

You get a lots of noise as you compile without the optimizations. If you compile with optimizations you will get much smaller code almost identical with the one you have (to prevent change from printf to puts you need to remove the '\n' https://godbolt.org/z/cs4qe9):

.LC0:
        .string "hello, world"
main:
        lea     ecx, [esp+4]
        and     esp, -16
        push    DWORD PTR [ecx-4]
        push    ebp
        mov     ebp, esp
        push    ecx
        sub     esp, 16
        push    OFFSET FLAT:.LC0
        call    puts
        mov     ecx, DWORD PTR [ebp-4]
        add     esp, 16
        xor     eax, eax
        leave
        lea     esp, [ecx-4]
        ret

https://godbolt.org/z/xMqo33

0___________
  • 60,014
  • 4
  • 34
  • 74