4

I need help with reverse engineering a virtual method from disassembly. The code was originally compiled with Microsoft's Visual C++. The method in question is as follows:

sub_92D110    proc near
xor al, al
retn
sub_92d110    endp

This method is referenced between a lot of classes, even multiple times inside of a vtable of one class. I am not sure what it does; does it mean the method got inlined but the call remains so that the vtables retain their size?

And if so, what does xor al, al do? Am I misunderstanding a calling convention or something?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Artemoire
  • 61
  • 4
  • 6
    It sets `al` to zero. The method body would be `{return 0;}`. – n. m. could be an AI Jul 17 '17 at 11:29
  • 2
    Also, I'm quite sure that this is a method that returns a `bool`; if it was an `int` it would be `xor eax,eax` (or possibly `xor ax,ax` for a `short`). As it is, it's `return false;`. It may also be a `char` method returning 0, but it's way rarer. – Matteo Italia Jul 17 '17 at 11:37
  • Yes, I think it makes more sense for a bool, so is it possible for the compiler to see a bunch of virtual method implementations just returning false, and make them all point to 1 method when compiled? If so, that's genious. In any case, thank you very much for this answer. I'm new to reverse engineering and was never well versed in assembly to begin with. I wasn't thinking bool was 1 byte when i was looking at the code. I was expecting a `__thiscall` and some offsetting of `ecx`, I didn't take into consideration a method could just return false. Thanks! – Artemoire Jul 17 '17 at 11:46
  • @Artemoire: glad that helped; I expanded a bit the explanation above in a full answer, with some references to additional reads about the merge of identical functions, I hope you'll find it interesting. – Matteo Italia Jul 17 '17 at 12:19

1 Answers1

11

This is most probably something like:

bool someclass::somemethod() {
    return false;
} 

Explanation

  • xor al,al sets the low byte of eax to zero.
  • All x86 calling conventions use eax as the "return value" register for register-sized integer values.
  • This cannot be a function returning an int (as in return 0;), given that it cleans only the low byte (and no x86 calling convention uses eax as an input parameter, so it's not some bizarre function that takes an integer argument, zeroes its low byte and returns it).
  • This leaves us with a function that returns a byte-sized value, set to zero, so it could either return a char/unsigned char (0) or a bool (false); I'm way more inclined to think that it's the second option, given that in practice it arises way more often (especially in "empty" base class implementations of methods possibly redefined by derived classes).
  • It either takes no parameters or is a variadic function that doesn't look at any of its parameters. C++ methods on VC++/x86 employ the __thiscall calling convention, which, besides putting the this pointer in ecx, is the same as __stdcall for "regular" methods, and same as __cdecl for variadics; now, __stdcall is callee cleanup, and here there's no cleanup to speak of, which would mean no arguments; on the other hand, there's no cleanup in the called method even in a __cdecl function, so we cannot rule out this possibility a priori. That being said, I don't think that this last option is likely.

This method is referenced between a lot of classes, even multiple times inside of a vtable of one class.

It's perfectly normal; VC++'s linker regularly merges unrelated functions that compile to the same machine code (and confusingly calls this "identical COMDAT folding").

Given that this process is very low level (it essentially looks at the bytes generated for the various functions and sees if they can be de-duplicated), in theory all the hypotheses above may hold together—it may be a method taking no arguments and returning a bool false in one vtable slot and a varargs method returning a char zero in another one.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Thank you @CodyGray for the general cleanup, I wrote the whole answer on the phone in a cab, so unfortunately I was a bit constrained both in editing and in cross-checking potentially incorrect blanket statements about calling conventions, hence the sloppy format and the "cautious" statements. :-) – Matteo Italia Jul 17 '17 at 12:47
  • 1
    Well, it's a great answer! You do impressive work while on the go. – Cody Gray - on strike Jul 17 '17 at 12:52
  • 2
    It's worth noting that VC++'s unrestricted "COMDAT folding" behavior is generally considered to be [illegal](https://stackoverflow.com/questions/26533740/do-distinct-functions-have-distinct-addresses) in C++, because it means that pointers to unrelated function may compare equal. Other linkers (`gold`) have implemented limited forms of COMDAT folding but don't use the unsafe behavior [by default](https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding) and also offer a "safe" version that only folds when it cannot be observed. – BeeOnRope Jul 18 '17 at 04:45
  • 2
    [This](https://stackoverflow.com/a/29057190/366904) is probably a better reference for what @BeeOnRope is talking about, and it also points to [this blog post](https://blogs.msdn.microsoft.com/vcblog/2013/09/11/introducing-gw-compiler-switch/) that suggests COMDAT folding is only applied when you aren't taking the address of the functions. If that is in fact true, it would suggest that this optimization *is* standards-compliant by the "as-if" rule. – Cody Gray - on strike Jul 18 '17 at 11:43
  • @CodyGray - well it's certainly a _different_ link but I don't know if it's better. What happened is that starting long ago MSVC merged identical functions, but not data, with default or at least common optimization options. This was probably not legal (although there was for a while a lot of debate per my link), but it would only very rarely cause problems since relying on address comparisons of _functions_ is apparently very uncommon. Per your blog link in VS2013 they apparently extended this behavior to certain types of global or static _data_ as well. – BeeOnRope Jul 19 '17 at 20:23
  • Now your first link is about this new _data_ folding, and the blog post also concentrates on it. Not surprisingly, address comparisons of _data_ are more common and this change blew up some people, including "internal teams" at Microsoft, so presumably the defaults options became more conservative, and/or other conservative options were added. The claim about not taking the address is false, as in the link the address is clearly taken and the optimization applies, and even their own blog contradicts it (see the part starting with "But with the help of /Gw..."). – BeeOnRope Jul 19 '17 at 20:29
  • I think the claim "Please note, the ICF optimization will only be applied for identical COMDATs where their address is not taken, and they are read only." is either just false or only applicable some some particular combination of flags. In my original comment implies that MSVC is _still_ doing this, but perhaps they are better behaved now and most of the compilers behave similarly: in the past MSVC aggressively combined identical functions and almost no one else did, now `gold` among others have this _option_ and perhaps the default MSVC options no longer fold functions. – BeeOnRope Jul 19 '17 at 20:32