2

I'm trying to write trampolines for x86 and amd64 so that a given function invocation is immediately vectored to an address stored at a known memory location (the purpose is to ensure the first target address lives within a given DLL (windows)).

The following code is attempting to use _fn as a memory location (or group of them) to start actual target addresses:

(*_fn[IDX])(); // rough equivalent in C

.globl _asmfn
_asmfn:
  jmp *_fn+8*IDX(%rip)

The IDX is intended to be constructed using some CPP macros to provide a range of embedded DLL vectors each uniquely mapped to a slot in the _fn array of function pointers. This works in a simple test program, but when I actually put it into a shared library (for the moment testing on OSX), I get a bus error when attempting to vector to the _asmfn code:

Invalid memory access of location 0x10aa1f320 rip=0x10aa1f320

The final target of this code is Windows, though I haven't tried it there yet (I figured I could at least prove out the assembly in a test case on OSX/intel first). Is the amd64 jump at least nominally correct, or have I missed something?

A good reference on trampolines on amd64.

EDIT

The jump does work properly on windows 7 (finally got a chance to test). However, I'm still curious to know why it is failing on OSX. The bus error is caused by a KERN_PROTECTION_FAILURE, which would appear to indicate that OS protections are preventing execution of that code. The target address is allocated memory (it's a trampoline generated by libffi), but I believe it to be properly marked as executable memory. If it's an executable memory issue, that would explain why my standalone test code works (the callback trampoline is compiled, not allocated).

technomage
  • 9,861
  • 2
  • 26
  • 40
  • 1
    I think you're asking for trouble by testing on an OS different to the one you're actually developing for. The problem you're experiencing could be OSX specific, for all you know the code will work perfectly on Windows exactly as it is! – Harry Johnston Aug 13 '12 at 22:18
  • The technique you are going to implement is similar to what [Detours system](http://research.microsoft.com/en-us/projects/detours/) does. Perhaps, its description could give you some hints. – Eugene Aug 14 '12 at 06:24
  • To check if the jump is correct, perhaps, you could disassemble the resulting binary code and check the addresses and offsets, etc. manually. It sometimes helps when you see which code the assembler/compiler/whatever has generated. I found several subtle bugs in my projects this way. – Eugene Aug 14 '12 at 06:30
  • 1
    The answers to this question (the implementation of the tools mentioned there) may also be helpful although they do not solve your problem directly: http://stackoverflow.com/questions/4507581/ – Eugene Aug 14 '12 at 06:34
  • http://stackoverflow.com/questions/5000529/directly-call-jump-in-asm-without-using-relevancex86/5006231 gives an example how to do this. – FrankH. Aug 14 '12 at 10:11
  • @HarryJohnston exactly right! – technomage Aug 14 '12 at 11:12
  • Strange how these old names are preserved, a "bus error" on an Intel processor?? This otherwise sounds like landing on a memory page that has the no-execute bit turned on. – Hans Passant Jan 01 '13 at 22:57

2 Answers2

0

When using PC-relative addressing, keep in mind that the offset must be within +- 2GB. That means your jump table and trampoline can't be too far away from each other. Regarding trampolines as such, what can be done on Windows x64 to transfer without requiring to clobber any registers is:

  1. a sequence:
    PUSH <high32>
    MOV DWORD PTR [ RSP - 4 ], <low32>
    RET
    this works both on Win64 and UN*X x86_64. Although on UN*X, if the function uses the redzone then you're clobbering ...

  2. a sequence:
    JMP [ RIP ]
    .L: <tgtaddr64>
    again, applicable to both Win64 and UN*X x86_64.

  3. a sequence:
    MOV DWORD PTR [ RSP + c ], <low32>
    MOV DWORD PTR [ RSP + 8 ], <high32>
    JMP [ RSP + 8 ]
    this is Win64-specific as it (ab)uses part of the 32-Byte "argument space" reserved (just above the return address on the stack) by the Win64 ABI; the UN*X x86_64 equiv to this would be to (ab)use part of the 128-Byte "red zone" reserved (just below the return address on the stack) there:
    MOV DWORD PTR [ RSP - c ], <low32>
    MOV DWORD PTR [ RSP - 8 ], <high32>
    JMP [ RSP - 8 ]
    Both are only usable if it's acceptable to clobber (overwrite) what's in there at the point of invoking the trampoline.

If is possible to directly construct such a position-independent register-neutral trampoline in memory - like this (for method 1.):

#include <stdint.h>
#include <stdio.h>

char *mystr = "Hello, World!\n";

int main(int argc, char **argv)
{
    struct __attribute__((packed)) {
                char PUSH;
                uint32_t CONST_TO_PUSH;
                uint32_t MOV_TO_4PLUS_RSP;
                uint32_t CONST_TO_MOV;
                char RET;
    } mycode = {
                0x68, ((uint32_t)printf),
                0x042444c7, (uint32_t)((uintptr_t)printf >> 32),
                0xc3
    };
    void *buf = /* fill in an OS-specific way to get an executable buffer */;
    memcpy(buf, &mycode, sizeof(mycode));

    __asm__ __volatile__(
        "push $0f\n\t"         // this is to make the "jmp" return
        "jmp *%0\n\t"
        "0:\n\t" : : "r"(buf), "D"(mystr), "a"(0));

    return 0;
}

Note that this doesn't take into account whether any nonvolatile registers are being clobbered by the function "invoked"; I've also left out how to make the trampoline buffer executable (the stack ordinarily isn't on Win64/x86_64).

FrankH.
  • 17,675
  • 3
  • 44
  • 63
  • The intent is to have a statically allocated trampoline embedded in a DLL (and thus usable by certain w32 API functions like normal keyboard hooks). That trampoline loads (from a fixed location) the address of a dynamically allocated trampoline, and jumps to it. – technomage Aug 14 '12 at 19:37
0

@HarryJohnston had the right of it, the permissions issue was encountered on OS X only. The code runs fine on its target windows environment.

technomage
  • 9,861
  • 2
  • 26
  • 40