I described a generic way to generate any thunk code you want in this answer. Let's redo it for your case, as an exercise.
Suppose that your class is defined as:
struct YourClass {
LRESULT YourMemberFunc(int nCode, WPARAM wParam, LPARAM lParam);
};
Write your thunk in C++ with placeholders for the actual addresses:
LRESULT CALLBACK CallWndProc(int nCode, WPARAM wParam, LPARAM lParam) {
YourClass *x = reinterpret_cast<YourClass*>(0x1122112211221122);
__int64 im = 0x3344334433443344;
LRESULT (YourClass::*m)(int,WPARAM,LPARAM) = *reinterpret_cast<LRESULT (YourClass::**)(int,WPARAM,LPARAM)>(&im);
return (x->*m)(nCode, wParam, lParam);
}
And call it in a way that prevents the compiler from inlining the call:
int main() {
LRESULT (CALLBACK *volatile fp)(int, WPARAM, LPARAM) = CallWndProc;
fp(0, 0, 0);
}
Compile in release and see the generated assembly (in Visual Studio, see the assembly window during debugging and turn on "Show code bytes"):
4D 8B C8 mov r9,r8
4C 8B C2 mov r8,rdx
8B D1 mov edx,ecx
48 B9 22 11 22 11 22 11 22 11 mov rcx,1122112211221122h
48 B8 44 33 44 33 44 33 44 33 mov rax,3344334433443344h
48 FF E0 jmp rax
This is going to be your thunk, with 44 33 44 33 44 33 44 33
replaced with the pointer to your member (&YourClass::YourMemberFunc
) and 22 11 22 11 22 11 22 11
replaced with the pointer to the actual object instance, at run-time.
Explanation of what's going on in the thunk
In the x64 calling convention (of which there is only one on Windows), the first four parameters are passed in the rcx, rdx, r8, r9
registers, in this order, from left to right. So when our thunk gets called, we have
rcx = nCode, rdx = wParam, r8 = lParam
For member functions there is an implicit first parameter holding this
pointer, so when entering YourMemberFunc
we must have
rcx = this, rdx = nCode, r8 = wParam, r9 = lParam
The compiler generated code does exactly this adjustment: it shifts r8 -> r9, rdx -> r8, ecx -> edx
, and then assigns our placeholder this = 1122112211221122
to rcx
. Now it has the parameters set up, and it can continue with an indirect jump to the function itself. rax
is used to hold the return value, so it does not have to be preserved across function calls. This is why it is used here to temporarily hold the destination address, which gives an opportunity for tail call optimization (a call/return pair replaced with single jump).
Why we have to do an indirect call? Because otherwise we will get a relative jump. But we cannot use a hard-coded relative jump because the thunk is going to be copied to different addresses in memory! Therefore, we resort to setting the absolute address at runtime and doing an indirect jump instead.
HTH