2

I'm new to this topic and I got stuck while trying to decompile a c function.

This is my problem: I was trying to decompile the code of a given function that is hidden in a pre-compiled project so I cannot see the code, just the prototype.

This is my function (I cannot see the code):

int  removeFromStack(tStack *p, void *d, unsigned cantBytes);

But I got this pseudo code from that particular function and it contains much more parameters:

__int64 __fastcall removeFromStack(void *a1, const void *a2, __int64 a3, size_t **a4)
{
  __int64 result; // rax
  size_t *v5; // rbx

  result = 0LL;
  v5 = *a4;
  if ( *a4 )
  {
    *a4 = (size_t *)v5[2];
    memcpy(a1, a2, *v5);
    free(a1);
    free(a1);
    result = 1LL;
  }
  return result;
}

Why is it happening?

I do not really understand why


EDIT:

public removeFromStack
removeFromStack proc near
push    rbx
sub     rsp, 20h
xor     eax, eax
mov     rbx, [rcx]
test    rbx, rbx
mov     r9, rdx
jz      short loc_525

mov     rax, [rbx+10h]
cmp     [rbx+8], r8d
cmovbe  r8d, [rbx+8]
mov     [rcx], rax
mov     rdx, [rbx]      ; Size
mov     rcx, r9
mov     r8d, r8d
call    memcpy
mov     rcx, [rbx]
call    free
mov     rcx, rbx
call    free
mov     eax, 1

loc_525:
add     rsp, 20h
pop     rbx
retn
removeFromStack endp


EDIT #2

image

I used the same project but a 32-bit version instead of the x64 one and now i got this:

int __cdecl removeFromStack(int a1, void *a2, int a3)
{
  int result; // eax
  int Size; // edx
  int Block; // ebx

  result = 0;
  Size = a3;
  Block = *(_DWORD *)a1;
  if ( *(_DWORD *)a1 )
  {
    if ( *(_DWORD *)(Block + 4) <= (unsigned int)a3 )
      Size = *(_DWORD *)(Block + 4);
    *(_DWORD *)a1 = *(_DWORD *)(Block + 8);
    memcpy(a2, *(const void **)Block, Size);
    free(*(void **)Block);
    free((void *)Block);
    result = 1;
  }
  return result;
}

; int __cdecl removeFromStack(int, void *, int)
public _removeFromStack
_removeFromStack proc near

Block= dword ptr -1Ch
Src= dword ptr -18h
Size= dword ptr -14h
arg_0= dword ptr  4
arg_4= dword ptr  8
arg_8= dword ptr  0Ch

push    ebx
xor     eax, eax
sub     esp, 18h
mov     ecx, [esp+1Ch+arg_0]
mov     edx, [esp+1Ch+arg_8]
mov     ebx, [ecx]
test    ebx, ebx
jz      short loc_53D

mov     eax, [ebx+8]
cmp     [ebx+4], edx
cmovbe  edx, [ebx+4]
mov     [ecx], eax
mov     eax, [ebx]
mov     [esp+1Ch+Size], edx ; Size
mov     [esp+1Ch+Src], eax ; Src
mov     eax, [esp+1Ch+arg_4]
mov     [esp+1Ch+Block], eax ; void *
call    _memcpy
mov     eax, [ebx]
mov     [esp+1Ch+Block], eax ; Block
call    _free
mov     [esp+1Ch+Block], ebx ; Block
call    _free
mov     eax, 1

loc_53D:
add     esp, 18h
pop     ebx
retn
_removeFromStack endp
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
JustToKnow
  • 785
  • 6
  • 23
  • 4
    Where did you get that pseudo-code from? A decompiler? Which one? (The double free is pretty concerning) – Shawn May 29 '21 at 19:25
  • 7
    I might guess the decompiler is confused about the calling convention being used. Ultimately you may have to read the assembly instead, to substitute your own knowledge for the decompiler's erroneous assumptions. Decompilers can be useful on occasion but aren't a one-step replacement for actual reverse engineering. – Nate Eldredge May 29 '21 at 19:31
  • @Shawn Hey pal, thanks for replying. I was using IDA PRO and hex-rays. – JustToKnow May 29 '21 at 19:48
  • Hey @NateEldredge hey, thanks for replying. Yeah, the problem is that i am not a pro while reading assembly (yet) and i was trying to get the hidden function using the decompiler but it is strange to me.. why would it do that? add new parameters – JustToKnow May 29 '21 at 19:51
  • 1
    Like I said, I suspect it is incorrectly detecting the calling convention being used. There may be a particular register or stack slot that would be used for a third argument in one convention, and for a fourth argument in another convention. I am not specifically familiar with ida-pro so I don't know whether there is perhaps a way to override its detection. – Nate Eldredge May 29 '21 at 19:54
  • @NateEldredge Which one do you use instead? – JustToKnow May 29 '21 at 19:55
  • 1
    I usually don't. I tend to find small functions easier to understand from the disassembly, and I don't generally have occasion to reverse-engineer large functions. – Nate Eldredge May 29 '21 at 20:01
  • can you show the original disassembly? – phuclv May 30 '21 at 03:19
  • @phuclv hey pal, take a look, i have edited my thread. Is that what you mean? – JustToKnow May 30 '21 at 04:33
  • What you are calling pseudocode is legal c code produced after the compiler is done with its work. The compiler has no use for variable names and therefore, does not store any info about them. Hence, the decompiler gives names like `a1`, `a2`, `a3`, `v5`. See https://stackoverflow.com/questions/273145/is-it-possible-to-decompile-a-windows-exe-or-at-least-view-the-assembly . – Kitswas May 30 '21 at 07:06
  • 1
    Do understand that compilation is a lossy process. A lot of information unnecessary to the machine is omitted in the process. Information that the decompiler could have used to make it more human understandable. That's why we write programs in High-Level Languages. – Kitswas May 30 '21 at 07:14
  • 1
    @PalLaden: It's legal but nonsensical C that only matches the asm if it's misinterpreted as using the x86-64 System V calling convention. This question is about the C being nonsensical like calling `free(a1)` twice, not about the non-meaningful variable names. (I hope.) – Peter Cordes May 30 '21 at 07:15
  • @NateEldredge take a look, i have edited my question. Does it look like most appropriate to you? – JustToKnow May 30 '21 at 15:34

1 Answers1

3

Looks like you're decompiling as if this was the x86-64 System V calling convention (4th arg in RCX, with a1=RDI, a2=RSI, a3=RDX),
but actually it's Windows x64 (1st arg in RCX, then RDX, R8, R9).
@NateEldredge's guess about this in comments was right.

That explains why it's wrong about the args being passed to free (it's actually *first_arg and then first_arg), and why it's inventing unused dummy args a1..3. Well, actually it thinks a1 and a2 (RDI and RSI) are passed on unchanged to memcpy. And then to free twice, because I guess it assumes RDI is unchanged even though nothing sets it again after memcpy returns. Compilers would of course not make code that depended on the value of a clobbered register, so this should have been a hint to IDA that this code wasn't using the calling convention it was assuming.

So tell IDA what where your code is from (Windows) so it knows what calling convention to assume. (I don't know IDA, but I can see from the asm and the C that this is the problem.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Hey Peter, thank you very much for replying. I'm checking this out, pal. – JustToKnow May 30 '21 at 12:55
  • It's been 2 hours i was not able to change the calling convention just yet. Still trying – JustToKnow May 30 '21 at 14:56
  • Take a look, i have edited my question. Now i'm getting that. What do you think? – JustToKnow May 30 '21 at 15:34
  • @programming_amazing: The decompilation from the 32-bit build looks like it matches the 64-bit asm. `int Block` is of course nonsense; notice that every time it's used, it gets cast to some pointer type. I'm not surprised a decompiler doesn't invent a `struct` that contains a pointer to the same struct, though. But that's just normal decompiler behaviour. – Peter Cordes May 30 '21 at 18:13
  • I am still trying to understand why the 64-bit build does not work properly. Peter, can i ask you a silly question?. How can i improve my brain to obtain the code just by looking the assembly code?. I am kinda new and i do not want to just look at the pseudo-code, want to understand the magic going on behind. – JustToKnow May 30 '21 at 18:33
  • @programming_amazing: The i386 System V calling convention is basically identical to Windows cdecl, except for some rules for struct pass/return maybe. So it's no surprise 32-bit with inefficient stack-args worked, even if IDA still thought it was Linux code. – Peter Cordes May 30 '21 at 19:17
  • 1
    @programming_amazing: re: learning asm: Looking at how simple things compile with `gcc -O1` or something is a good starting point for the harder task of seeing in reverse. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). It's not fundamentally different from learning how to read some other programming language, e.g. learning to read a C program and see how a loop over an array implements something you might do in Python with a single operation applied to a list. (That's a rough comparison since C truly does map to asm easily, python doesn't.) – Peter Cordes May 30 '21 at 19:19