-3

I'm trying to port a wrapper function from 32bit to x86-64 asm for the Windows ABI. The function depends on indexing into its arguments as an array.

I know that MSVC cannot do inline assembly in X64 projects, but i am interested to build the equivalent function into a X64 .asm file.

The function sets the stackframe for the api to be called.

__declspec( naked ) PVOID WINAPIV CGX86( FARPROC lpPtr, UINT lpSize, ... )
{
    __asm {
        push ebp;
        mov ebp, esp;
        lea eax, [ ebp + 0x04 ];
        mov [ ebp - 0x04 ], eax;
        mov eax, [ ebp - 0x04 ];
        mov ecx, [ ebp + 0x0C ];
        add ecx, 2;
ParseArgs:
        cmp ecx, 2;
        jz short MoveFinal;
        push dword ptr [ eax + ecx * 0x04 ];
        sub ecx, 1;
        jmp short ParseArgs;
MoveFinal:
        call [ ebp + 0x08 ];
        mov esp, ebp;
        pop ebp;
        retn;
    }
}

example use:

CGX86( ( FARPROC )MessageBoxA, 4, GetForegroundWindow( ), "BODY", "TITLE", MB_OK );
  • 2
    And by "help" you mean you want us to write it for you? – Jester Apr 25 '16 at 21:20
  • Unfortunately this is a bigger task that you might think because the calling convention used on X64 is much different. It would have to be completely rewritten from scratch. – Ross Ridge Apr 25 '16 at 21:28
  • 2
    @MichaelPetch yeah it works, because eax points to the return address as you said and it uses 1 based indexing for arguments, so the last argument it copies will be `ecx=3` which gives `eax+3*4` and that's `ebp+16`. I don't understand the purpose of `push ecx` though. – Jester Apr 25 '16 at 21:38
  • @Jester: I removed my comment shortly after I put it in. It wasn't until i saw what the function did in the body that i realized it was okay. I was going to post a followup about the useless duplication here `mov [ ebp - 0x04 ], eax;` `mov eax, [ ebp - 0x04 ];` but decided to remove the entire thing. – Michael Petch Apr 25 '16 at 21:44
  • the push ecx was a mistake i needed to save the register to restore later. but I now see that Ross Ridge is right.. its not possible to index arguments like an array on x64 so this is pointless. Thanks anyways guys. http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/ – RaulFernando Apr 25 '16 at 21:46
  • 1
    You can nevertheless write the equivalent code of course. Also, if you know you will need at most, say, 6 arguments a simple C `switch` would get rid of the assembly code. – Jester Apr 25 '16 at 21:47
  • Brilliant idea jester, i will create 6 dummy args and then put the gold on rbp, thx sir :) – RaulFernando Apr 25 '16 at 21:55
  • @RossRidge and other close voters: I made the question specific to the part that Raul actually needed help with. (@ Raul: edit again yourself if you want to further improve your question to help other people in the future, or roll-back if you don't like my changes.) – Peter Cordes Apr 25 '16 at 22:21
  • 1
    @PeterCordes I don't think your edits improve improve the question much. A good answer to the question would still have to detail a complete solution using the X64 calling convention, including how to handle the registers used for the arguments, aligning the stack correctly and creating unwind info for SEH. Since the code now needs to be written in MASM or some other assembler instead of inline assembly that needs to be covered as well. Then there's the problem that this function seems to be useless and unnecessary, so it's not clear what its real requirements are. – Ross Ridge Apr 26 '16 at 16:18
  • @RossRidge: great point on unwind info. I agree that a pure C++ wrapper is a much better choice, since the args are usually compile-time constants. As far as handling regs used for args, I thought it was obvious: store args 3 and 4 into their slots in the shadow space, then use the same indexing (with a scale factor of 8). The OP already said he knew he'd need to write it in a separate `.asm`, not inline. – Peter Cordes Apr 26 '16 at 16:26
  • 1
    @PeterCordes Actually, you need to take the incoming args 5 and 6 off the stack and into registers. You don't need to put anything into the shadow space. You can maybe simplify things by copying MAX(2, lpSize - 2) stack slots starting at third shadow slot, but the need to align the stack before this means its probably simpler to just copy actual stack slots used. What makes this function seem to be useless is that calling the function directly (eg. `MessageBoxA(GetForegroundWindow(), "BODY", "TITLE", MB_OK)`) is the best choice. So the question is what problem is this function really solving? – Ross Ridge Apr 26 '16 at 17:02
  • First thx for the feedback guys. This function makes it easy to implement angelscript functions without having to prototype everything and bind a huge amount of callbacks to the app, It also decoys the ReturnAddress of the API real caller so it makes a bit harder to reverse. Using GetProcAddress or another methods to retrieve the API Offset will also make it hard to understand and follow on IDA. – RaulFernando Apr 26 '16 at 18:43

1 Answers1

1

Jester's suggestion to write it in C is probably a good one, esp. if it can be inlined into calls where some of the args are compile-time constants. Your example use-case passes mostly compile-time-constant args, including the function pointer. Any decent optimizing compiler will inline this and optimize away the indirection into just a normal function call with the right args. Make sure you put the definition somewhere it can be inlined.


However, if you can't get the compiler to make nice code:

Indexing arguments as an array is the only piece of functionality that's non-obvious how to implement in the 64bit ABI where some args are in regs.

The Windows 64bit calling convention provides space for storing the 4 register args right below the stack args (the shadow space), so you actually can create an array of args you can index with only at most 4 instructions (store the args into those slots). You don't need to special-case the first 4 args.

Don't forget to put the outgoing args into regs instead of the stack, too, and leave shadow-space for the function you call.

As Ross Ridge points out, make sure you include directives to create SEH unwind info in the stand-alone asm. This is another good reason to favour a pure C++ solution, esp. if the number of args is limited to a small number.

See the tag wiki for links to calling conventions and so on.

I'm not a fan of the Windows calling convention in general, but it does make implementing var-args functions simple. I'm pretty sure that's why the "shadow space" exists in the ABI.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • thanks for the help, I understand now. I voted you up and with the tick, dont know who voted and why you got a down. – RaulFernando Apr 25 '16 at 22:06
  • @RaulFernando look like you pressed the wrong button, you downvoted not up (there is only a single vote on it and that's down). – Jester Apr 25 '16 at 22:08
  • @Jester: It was downvoted for a while before Raul's comment. And now the downvote is removed, after I edited to clarify why this *is* an answer. :) – Peter Cordes Apr 25 '16 at 22:09
  • Yeah I was confused because Raul said he "voted up". – Jester Apr 25 '16 at 22:16
  • 1
    The 2 reasons given for the *shadow space* is mentioned in the Windows [x64 calling convention](https://msdn.microsoft.com/en-us/library/ms235286.aspx) in the MSDN documentation. _C_ Varargs and unprototyped _C_ functions. – Michael Petch Apr 25 '16 at 22:26
  • @MichaelPetch: Do you have a canonical link for the `__vectorcall` ABI that I could add to the x86 tag wiki at the same time? It only has good links to official docs for non-Windows ABIs because I never took the time to dig up good links. Ideally also some Microsoft page about 32bit ABIs. – Peter Cordes Apr 25 '16 at 22:30
  • __vectorcall : https://msdn.microsoft.com/en-us/library/dn375768.aspx – Michael Petch Apr 25 '16 at 22:32
  • @MichaelPetch: Thanks, updated http://stackoverflow.com/tags/x86/info with those and 32bit `__stdcall` / `__cdecl`. – Peter Cordes Apr 25 '16 at 22:46