0

I am reading the book Practical Malware Analysis and in it appears this example code:

00401022 call ds:CoCreateInstance
00401028 mov eax, [esp+24h+ppv]

The author then states:

The COM object returned will be stored on the stack in a variable that IDA Pro has labeled ppv, as shown.

My question is, why is this? Since we do a mov eax, [esp+24h+ppv], wouldn't this be moving the data inside of [esp+24h+ppv] into eax and overwriting the return value rather than storing the return value in the variable? I thought in Intel format, mov operand1, operand 2 always placed the 2nd operand into the first.

Note: It's page 558 if anyone happens to have the book, by the way.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
the_endian
  • 2,259
  • 1
  • 24
  • 49

2 Answers2

2

I have very little experience with COM, but a quick glance at MSDNs CoCreateInstance function reveals this signature

HRESULT CoCreateInstance(
  _In_  REFCLSID  rclsid,
  _In_  LPUNKNOWN pUnkOuter,
  _In_  DWORD     dwClsContext,
  _In_  REFIID    riid,
  _Out_ LPVOID    *ppv
);

So CoCreateInstance does return an out parameter called ppv which seems to be, conveniently, extracted by IDA Pro as well.

The ppv out value is defined as

Address of pointer variable that receives the interface pointer requested in riid. Upon successful return, *ppv contains the requested interface pointer. Upon failure, *ppv contains NULL.

The return value supposedly returned in EAX is merely one of these five values:

  • S_OK: An instance of the specified object class was successfully created.
  • REGDB_E_CLASSNOTREG: A specified class is not registered in the registration database. Also can indicate that the type of server you requested in the CLSCTX enumeration is not registered or the values for the server types in the registry are corrupt.
  • CLASS_E_NOAGGREGATION: This class cannot be created as part of an aggregate.
  • E_NOINTERFACE: The specified class does not implement the requested interface, or the controlling IUnknown does not expose the requested interface.
  • E_POINTER: The ppv parameter is NULL.

The returned ppv value is the real pointer to the COM object which is then accessed with the

mov eax, [esp+24h+ppv]

instruction. So the return value which contains the possible error code (anything other than S_OK) is immediately overwritten (So it it assumed that the COM call succeeds).

DWORD PTR [esp+24h+ppv] (somehow) points to the base address of the COM-object, loading it into EAX.

But I cannot denote the addressing mode. Maybe it is a special kind of syntax display of IDA Pro.

From there, this pointer in EAX is used to access the COM-object and - one step further - its methods like described in the comments.

This CodeProject article may give you further insight.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • `[esp+24h+ppv]` is just the stack slot that "ppv" is allocated on. The `esp + 24` part is because the function doesn't use a frame pointer and value of ESP will change through out the function while the value assigned to the symbol `ppv` previously in the IDA disassembly doesn't change. To call a method on the interface pointer now contained in EAX the code with would have to do something like `mov ebx, [eax]` to get the vtable and then something like `call [ebx + 12]` to call a method in the vtable. – Ross Ridge Mar 04 '17 at 05:17
  • @RossRidge: Yes, I supposed so. But I was unable to find a suiting addressing mode for `DWORD PTR ppv[ESP+24h]` like `disp32+r32+disp8`. Hence my doubts. I still haven't found an answer. – zx485 Mar 04 '17 at 06:28
  • @RossRidge you're absolutely right because the lower code looks like this: `mov edx, [eax] ; put ppv into edx` followed by `call dword ptr [edx+2Ch]` which is essentially calling the function ptr at the offset 2C from the start of the COM object. zx485, good call on that one! It shows how important it is to think outside the box when reverse engineering because I hadn't considered that possibility that they were overwriting the return as I wouldn't personally do it that way! Great explanation. COM is confusing stuff at first, especially with the VARIANT stuff as well. – the_endian Mar 04 '17 at 06:55
  • Also useful reference: https://www.fireeye.com/blog/threat-research/2010/08/reversing-malware-command-control-sockets.html - shows the offsets for common functions - also confirms theres no quick and easy way to get them! – the_endian Mar 04 '17 at 07:22
  • 1
    The addressing mode used only has a single (probably 8-bit) displacement: the value of the sum `24h + ppv`. Somewhere before this the IDA disassembly will have a line like `ppv = -10h`, where `-10h` is the offset relative to EBP the stack slot would have had if EBP was being used as a frame pointer. As things get push and popped off the stack in the function the value of ESP will change and thus so will the displacement of the `ppv` stack slot relative to ESP changes. So at one point `ppv` might be `[esp + 14h]` but at another point it might be `[esp + 20h]`. – Ross Ridge Mar 04 '17 at 07:47
0

It's clear from the author's description of the code that those operands are in AT&T order (source first, then destination). Did the author earlier specify that the code was written with Intel ordering or is that just an assumption on your part? It is (unfortunately and confusingly) common for x86 assembly to be written using both styles, as discussed in another question:

MOV src dest (or) MOV dest src?

Community
  • 1
  • 1
TypeIA
  • 16,916
  • 1
  • 38
  • 52
  • 2
    why is the operands are in AT&T order but the syntax is like Intel (memory references in `[]`, no `%` prefix...)? – phuclv Mar 04 '17 at 03:53
  • Hmmmm. Well the entire book is in Intel, IDA Pro uses Intel by default as well. I'm at a loss of words here. – the_endian Mar 04 '17 at 03:58