1

I'm trying to disassemble a file, and one of the sections contained this. What is this doing? What would it look like in C?

I believe it copies 40 to ebp-8 and copies 20 to ebp-4. Then it calls the func: function. That performs a few commands by adding edx to eax and then subtracts 4 from it. After it exits the func: function it adds 8 to esp. Am I on the right track?

func:
push ebp
mov ebp, esp
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add eax, edx
sub eax, 4
pop ebp
ret
main:
push ebp
mov ebp, esp
sub esp, 16
mov DWORD PTR [ebp-8], 40
mov DWORD PTR [ebp-4], 20
push DWORD PTR [ebp-4]
push DWORD PTR [ebp-8]
call func
add esp, 8
leave
ret

EDIT: So would you agree that the result of the C would be the following?

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

int func(int d, int e)
{
    int sum = d + e;
    int result = sum - 4;
    return result;
}

int main(void)
{
    int a = 40;
    int b = 20;
    int c = func(a,b);
    printf("Result is: %d\n", c);
}
rockower
  • 111
  • 9
  • 2
    `func` takes two parameters passed on the stack;adds them together, subtract 4 from the result and returns that value in EAX. EAX is then returned by `main` so the value 56 (40+20-4) will be returned as a as a return code.to whatever launched `main` (command shell etc) – Michael Petch Sep 23 '18 at 23:16
  • what does this code do? `mov ebp, esp sub esp, 16 mov DWORD PTR [ebp-8], 40 mov DWORD PTR [ebp-4], 20 push DWORD PTR [ebp-4] push DWORD PTR [ebp-8] call func add esp, 8` – rockower Sep 24 '18 at 00:04
  • 1
    There is no `printf` in the original.It would be `return func(a,b);` – Michael Petch Sep 24 '18 at 02:12
  • 1
    Your `func` is equivalent, but we can tell from the fact that it's un-optimized compiler output that it was actually written with a single statement and no tmp variable as `return d + e - 4;`. https://godbolt.org/z/ari39w. gcc/clang `-O0` don't optimize away tmp vars, and spill everything to memory between statements, unlike normal code-gen with optimization enabled where both versions do compile to the same asm. – Peter Cordes Sep 24 '18 at 03:06

1 Answers1

2

Broken down, the code looks like this:

func:
; enter 0, 0
push ebp
mov ebp, esp
; entered func with no local variables

; get first param in edx
mov edx, DWORD PTR [ebp+8]
; get second param in eax
mov eax, DWORD PTR [ebp+12]

add eax, edx    ; eax += edx
sub eax, 4      ; eax -= 4

; to avoid segfault, you should first `mov esp, ebp` 
; but works here, since ESP was not changed, so getting back ESP's old value is not required
pop ebp
ret

main:
; enter 16, 0
push ebp
mov ebp, esp
sub esp, 16    ; adds 4 elements on the stack
; entered main with 4 local variables on stack

; writing on 2 local variables
mov DWORD PTR [ebp-8], 40
mov DWORD PTR [ebp-4], 20

; push 2 params on the stack and call `func`
push DWORD PTR [ebp-4]    ; second param
push DWORD PTR [ebp-8]    ; first param
call func                 ; calls `func(first, second)`, returns EAX = 56

; delete 2 elements off the stack
add esp, 8

; leave entered function (get back ESP from before entering)
leave

; return to caller
ret

I think taken the explanation in the comments (marked by ;), it should be easy for you to translate it into a C code yourself.


EDIT: As Peter Cordes marked out, Assembly does not know any data types such as int or long int. Im x86 assembling, you use the general registers and with the C Conventions, any 32bits value is returned in EAX, while 64bits values are returned in EDX:EAX, meaning the content of EDX will be the upper 32 bits.

But if the main label is the classic int main() function in C and the entry point of the program, we can assume, that func looks like int func(int p1, int p2) as well in C, I believe, as the returned EDX is never used and the int main() function seems to end with return 56; with 56 in EAX.

  • In functions that use a legacy stack-frame at all, GCC uses `leave` if ESP!=EBP, otherwise just `pop ebp`. `gcc -O0` gives you consistent debugging (by spilling all variables to memory after every C statement), not totally braindead boilerplate code everywhere. And BTW, `sub esp, 16` isn't "4 local variables", it's 2 dwords + 8 bytes of padding so ESP will be aligned by 16 again after 2 pushes, for `call func`, as required by the ABI. – Peter Cordes Sep 24 '18 at 01:15
  • C variables are not fixed size; in the i386 System V ABI, a `double` takes 8 bytes, a `char` takes 1, an array or struct can take an arbitrary amount. You forgot to specify the size and type of the locals. It's presumably `int` or `unsigned int`, but could also be `unsigned long` or `long`. Any of those would be correct. – Peter Cordes Sep 24 '18 at 01:16
  • The `add esp,8` should be grouped with the `call`. It's popping the args. It's totally redundant to do that right before `leave`, so it's only there because this is anti-optimized code (`-O0`) that has to support GDB `jump` commands to jump to another line / C statement in the same function. (Not asm instruction, just C source line. [Is it possible to "jump"/"skip" in GDB debugger?](https://stackoverflow.com/a/46043760)) – Peter Cordes Sep 24 '18 at 01:18
  • `enter 4,0` isn't right, it would be `enter 16,0` because `enter` uses byte offsets. The `imm16` AllocSize is not scaled, see [the Operation section of the manual](http://felixcloutier.com/x86/ENTER.html). I'm not sure it's helpful to even mention ENTER, because to understand what it does and how stack frames work, it's easier to break it back down into the separate operations. And you never want to use it (unless maybe optimizing for code-size), because slow on modern x86. (like a dozen uops). – Peter Cordes Sep 24 '18 at 01:24
  • You got a point. I just said 4 instead of 4*4 (=16!) as 32bits = 4 bytes and 4 stack elements were required. Writing down `enter` and `leave` have a purpose for understanding in my answer, optimization shouldn't be the priority when trying to just understand things. – christopher westburry Sep 24 '18 at 01:47
  • If return type is 32bits, then the returned value in `EAX` is taken, for 64bits `EDX:EAX`. It will gibe different results indeed, so reverse-engineering might not result in the exactly same code. But one can assume the used types (except for un/signed) by looking at e.g. whether `EDX` is used by the caller. In this case, it is unknown, as `main` returns same `EAX` and `EDX` values. Assuming this is the sole C like `main` function, the returned value from `func` was never really used, though, except for `return (int) 56;` (with EAX = 56). So I assumed 32bits `int`. – christopher westburry Sep 24 '18 at 01:55
  • 1
    Yes, your analysis is correct. My point was that part of reverse-engineering it back to C is getting the types right. `int` fits, but there are other options. Yes, this is C, so we know `main`'s return type is `int`, not `long long`. Interesting point that `unsigned long long func(unsigned int a, unsigned int b) { return (a+b-4) | ((unsigned long long)a << 32); }` could also compile to the same code for `func`. (But probably wouldn't with optimization disabled). Oh, that's cool, `clang -O3 -m32` *does* actually compile that function that way: https://godbolt.org/z/kzrCj8 – Peter Cordes Sep 24 '18 at 02:57
  • 1
    But from the fact that this is pretty clearly un-optimized compiler-generated code, we can infer that `func`'s return type is 32-bit. And yes, it looks like `main` ended with `return func(a,b);`. In C99 and later, there's an implicit `return 0` at the end of `main`, so there would be an instruction to zero EAX if `main` hadn't returned explicitly. (In C89, returning with `func`'s return value still in EAX could have just been a side-effect of undefined behaviour: falling off the end of a non-`void` function. But C99, like C++, has a default-return in `main`.) – Peter Cordes Sep 24 '18 at 03:02