1

I compiled a simple program using Visual Studio 2017

#include <stdio.h>
int main()  
{
    printf("hello, world\n"); 
    return 0; 
}

I compile it with command line

cl 1.cpp /Fa1.asm 

This gives me assembly code that (mostly) makes sense to me

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.14.26429.4 



INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

CONST   SEGMENT
$SG5542 DB  'hello, world', 0aH, 00H
CONST   ENDS
PUBLIC  ___local_stdio_printf_options
PUBLIC  __vfprintf_l
PUBLIC  _printf
PUBLIC  _main
PUBLIC  ?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA ; `__local_stdio_printf_options'::`2'::_OptionsStorage
EXTRN   ___acrt_iob_func:PROC
EXTRN   ___stdio_common_vfprintf:PROC
;   COMDAT ?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA
_BSS    SEGMENT
?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA DQ 01H DUP (?) ; `__local_stdio_printf_options'::`2'::_OptionsStorage
_BSS    ENDS
; Function compile flags: /Odtp
_TEXT   SEGMENT
_main   PROC
; File c:\users\mr dai\documents\michael\study\cybersecurity\reverseengineering4beg\random\random\random.cpp
; Line 3
    push    ebp
    mov ebp, esp
; Line 4
    push    OFFSET $SG5542
    call    _printf
    add esp, 4
; Line 5
    xor eax, eax
; Line 6
    pop ebp
    ret 0
_main   ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
;   COMDAT _printf
_TEXT   SEGMENT
__Result$ = -8                      ; size = 4
__ArgList$ = -4                     ; size = 4
__Format$ = 8                       ; size = 4
_printf PROC                        ; COMDAT
; File c:\program files (x86)\windows kits\10\include\10.0.17134.0\ucrt\stdio.h
; Line 954
    push    ebp
    mov ebp, esp
    sub esp, 8
; Line 957
    lea eax, DWORD PTR __Format$[ebp+4]
    mov DWORD PTR __ArgList$[ebp], eax
; Line 958
    mov ecx, DWORD PTR __ArgList$[ebp]
    push    ecx
    push    0
    mov edx, DWORD PTR __Format$[ebp]
    push    edx
    push    1
    call    ___acrt_iob_func
    add esp, 4
    push    eax
    call    __vfprintf_l
    add esp, 16                 ; 00000010H
    mov DWORD PTR __Result$[ebp], eax
; Line 959
    mov DWORD PTR __ArgList$[ebp], 0
; Line 960
    mov eax, DWORD PTR __Result$[ebp]
; Line 961
    mov esp, ebp
    pop ebp
    ret 0
_printf ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
;   COMDAT __vfprintf_l
_TEXT   SEGMENT
__Stream$ = 8                       ; size = 4
__Format$ = 12                      ; size = 4
__Locale$ = 16                      ; size = 4
__ArgList$ = 20                     ; size = 4
__vfprintf_l PROC                   ; COMDAT
; File c:\program files (x86)\windows kits\10\include\10.0.17134.0\ucrt\stdio.h
; Line 642
    push    ebp
    mov ebp, esp
; Line 643
    mov eax, DWORD PTR __ArgList$[ebp]
    push    eax
    mov ecx, DWORD PTR __Locale$[ebp]
    push    ecx
    mov edx, DWORD PTR __Format$[ebp]
    push    edx
    mov eax, DWORD PTR __Stream$[ebp]
    push    eax
    call    ___local_stdio_printf_options
    mov ecx, DWORD PTR [eax+4]
    push    ecx
    mov edx, DWORD PTR [eax]
    push    edx
    call    ___stdio_common_vfprintf
    add esp, 24                 ; 00000018H
; Line 644
    pop ebp
    ret 0
__vfprintf_l ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
;   COMDAT ___local_stdio_printf_options
_TEXT   SEGMENT
___local_stdio_printf_options PROC          ; COMDAT
; File c:\program files (x86)\windows kits\10\include\10.0.17134.0\ucrt\corecrt_stdio_config.h
; Line 85
    push    ebp
    mov ebp, esp
; Line 87
    mov eax, OFFSET ?_OptionsStorage@?1??__local_stdio_printf_options@@9@4_KA ; `__local_stdio_printf_options'::`2'::_OptionsStorage
; Line 88
    pop ebp
    ret 0
___local_stdio_printf_options ENDP
_TEXT   ENDS
END

However I open the exe with IDA and the disassemble spits out something completely unintelligible.

ida_cap

Why is this the case? Does visual studio automatically obfuscate assembly code by default? I googled but this doesn't seem to be the case. If so, how do I turn obfuscation off?

Thanks

ackbar03
  • 77
  • 1
  • 8
  • You are looking at the runtime library that got linked into the program. And sure, C and C++ programmers never have to spend any money on obfuscators. – Hans Passant Nov 17 '18 at 07:33
  • so its not being obfuscated? How do I view my actual main method then so I can see something similar to the ASM file? Also how come I don't see this mess when I open some of my other exe's used for practice, and why doesn;t it show up in ASM file? Sorry for so many questions, thanks – ackbar03 Nov 17 '18 at 08:09
  • It is in there some where. But hard to find since you wrote so little code. The diagram presentation is not very useful to find it back. The ASM file never includes runtime library code. I can/t see those "other exe's", but whether you built the program with /MT vs /MD matters a lot. /MT includes the runtime library in the program, /MD leaves it in another DLL. There is however always *some* runtime library support linked in, necessary to get the program initialized and started. – Hans Passant Nov 17 '18 at 08:15
  • I found it by doing a search on the assembly code in IDA and it was buried in there somewhere. How is it usually done practically on other programs? All the library linkages are extremely confusing – ackbar03 Nov 17 '18 at 09:41
  • 1
    It is not usually done, nobody has time for this. – Hans Passant Nov 17 '18 at 09:44

1 Answers1

11

main is not the entry point of you program, at this level.

The C language is an abstraction, before main can ran the runtime must be initialised.
C++ is even more complicated but the idea is the same: there is some code that is run before main (otherwise who would initialise cout for example?)

At the end of the day, all languages compile down to a binary PE, it's important to get familiar with that.

I've compiled your program in order to show how to find main.
Note however that, based on your code, I assumed you were compiling a C file while looking at the disassembly it seems you compiled a C++ file. C++ is notoriously more complex to reverse engineer.

The example below will differ from yours, mine is an x86 debug build.


First of all, IDA tells you where the PE entry point is in the Exports tab

PE entry-point in the Exports tab

If you double click it and follow the path the jmp and call instructions (there is only one path, you can't get lost) you arrive at a routine with two calls

Init and body

VS generate a security cookie as the first thing and that's what the first function called does:

Security cookie

Note that this routine is pretty recognisable even without the IDA hints because it make very spot-on API calls, you can Google some of the function names in the picture to find the documentation.

Assessed that this is the security cookie generation routine we then get back to the previous one and enter the second call

Body

This is the body of your program, not main but where the CRT is initialised and finalised, including calling main.
Take a look at the flow-chart in the lower left corner and see that most of the work is in the left branch (meaning that the right branch is an error condition).

main is usually called a few calls before _exit or _cexit. We then move close to these calls:

Candidate calls

If you hover over a function call IDA will show the function code.
Functions that are a single jmp are present in the the debug build to help the debugger and are more often than not runtime functions.
The first circled function, when hovered, show a call to an "enviroment" routine, this is good since main needs the program arguments (Windows don't pass the argument to the program, there is a specific API to get them).

Pre main

This looks like a call to main, the arguments match.
In fact, if we enter the call we get to main:

main


Of course, you could have found main by simply looking for the "Hello, world!" string in the "String view" (Shift + F12) but in a real world scenario of reverse engineering this almost always not possible.

Making program and then reverse engineering them is a very good approach, if you have VS installed, you may have the MS DIA SDK that allows IDA to read the pdb files.
This greatly help with the reverse engineering, you can load two IDAs, one with the PDB and one without and compare.
Unfortunately, getting the MS DIA SDK may not be so easy.

Also, IDA FLIRT is a must.
It's a library of method signature, it allows IDA to recognise the runtime functions, making it very easy to focus on the application real code. It's hard to find (and harder to generate) the signature though but they are totally worth it.


Finally, note that debug build may be harder to reverse engineer due to the type of code generated.
If you make a release build and reverse engineer it, you'll see it's easier to get to main.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Thanks so much for taking the time to explain this! I was hoping to get some advice and pointers like this. I'll definitely take a look at FLIRT and Ms DIA sdk. Do you have any good resources to recommend for learning this? A lot of what you've mentioned above seems to come from experience, I tried googling my question earlier and had trouble finding the info I needed. I'm currently going through a assembly book and intro to reverse engineering. I also have Ida pro book but haven't had the time to go through it yet. – ackbar03 Nov 17 '18 at 15:15
  • @ackbar03 My advice is to learn Windows API programming in C and, if needed, C++ programming. Each time look at the generated machine code. Basically, reverse engineer your program. The assembly used is generally a small subset of the whole ISA, so you can learn it directly from the RE (referencing the Intel manual when needed). COM technology is another thing to see. Check the output of other compilers too (e.g. Delphi, VB6, MinGW/DevC++). Basically the more programs you see the easier you'll recognise them. – Margaret Bloom Nov 18 '18 at 14:11
  • @ackbar03 Also tools like DynamoRIO, Frida, rr, Time triver debugger. [Here's a very good article on how to use them to build an exploit](http://blog.ret2.io/2018/06/05/pwn2own-2018-exploit-development/), different topic but affine. – Margaret Bloom Nov 18 '18 at 14:13
  • @ackbar03 Oh, and take your time. It's ok to spend a week to completely RE a binary. Websites like AnyRun have malware samples, to train on real world scenarios. – Margaret Bloom Nov 18 '18 at 14:14
  • 1
    You are a god Margaret. I will build a shrine in your image – ackbar03 Nov 18 '18 at 16:48