1

(This question refers specifically to x86/x86_64)

I'm working on an application that needs to insert a small block of instructions at specific points within another (both in userspace) at runtime. The general process currently looks like this:

  • Find a suitable replacement point
  • Copy the instruction being replaced (plus the added ones) to an unused area of memory
  • Overwrite the original instruction with a jmp (can't use call for fear of corrupting the stack) pointing to the moved code

The redirected block of instructions carries out some basic operations, restores the processor back to its pre-jump state, runs the replaced instruction, then jumps back to the original code.

It is of critical importance that the original functionality of the application being modified is perfectly maintained. With the current setup, it is of course possible that the targeted application could detect the replacement and change its behavior, or otherwise attempt to defend against such modification, but the intended use of this program makes it very unlikely that that the target would care to defend itself in any serious way, so this is not a concern. Some basic steps are already taken to ensure this:

  • The replacement point is checked to be a single instruction greater than or equal to in size to the replacing jmp
  • The moved block is checked to ensure that the replacing jmp can actually reach it

Obviously, however, some instructions may not execute correctly after they have moved (in other words, they are position dependent). Of the top of my head, the only such instructions I can think of are ones that use RIP-relative addressing, like any relative calls / jumps like jmp, jcc, call and others, and any with a [rip+rel32] memory operand. This is easily detected, and such instructions can be modified so that the calculated addresses refer to their original locations.

Are there any other instructions (or encodings) that could pose a problem here?

the4naves
  • 333
  • 2
  • 9
  • 1
    FYI, 32-bit code *could* use `call`, at least for safely-written code. Without a red-zone, data below ESP can be asynchronously clobbered at any point, including by a debugger. (e.g. if you ran `print foo(123)` so it used your process's stack to call the `foo` function in your code. Otherwise debuggers under a multi-tasking OS aren't intrusive.) Of course, in real life there could be some hand-written asm in user-space programs that depended on this and in practice don't break, because only signal handlers would normally cause a problem. And some programs don't install any signal handlers – Peter Cordes May 12 '22 at 23:23
  • @PeterCordes Using `call` would definitely simplify a large part of the program. That being said, I'd like this setup to work with as many different target applications as possible, so the assumption is that the code being modified is terribly written, with basically every other worst-case scenario possible (short of those making this project infeasible). Good to know either way though! – the4naves May 12 '22 at 23:29
  • 1
    Yes, I think this sounds right, anything with a rel32 or rel8 depends on surrounding code, such as `jge foo` or `jrcxz skip_loop`, or `mov eax, [rip + foo]`. Other basic x86-64 instructions aren't, but maybe some new extensions could be. e.g. the CET extensions with `bnd` registers might be relevant, `endbr64` and so on ([What does the endbr64 instruction actually do?](https://stackoverflow.com/q/56905811)). I don't know the details of how it works so I'm not sure, but something to check on. – Peter Cordes May 12 '22 at 23:29
  • Keep in mind that the function may contain a jump *to* this instruction, so it could be executed multiple times. – Peter Cordes May 12 '22 at 23:32
  • 4
    Moving a `call` instruction will change the return address. If the called function uses the return address's for anything beyond just returning to it (e.g., using it as a key in a lookup table, using it to locate other data), then moving the instruction will break the called function. A relocated `syscall` may not work because the syscall trap could use the address of the syscall itself to decide what to do next. There are some kernel tricks that can use instruction addresses to alter behavior, even if the instruction is the same. – Raymond Chen May 12 '22 at 23:35
  • 1
    @PeterCordes The intended purpose of the inserted instructions is to collect information on the running of the target program, so that has definitely been accounted for. I'd forgotten about rel operands; thanks for reminding me. I don't quite get the bit about "Other basic x86-64 instructions aren't...". I assume you mean that you're only referring to the base x86/x86_64 instruction set? If so, that should be ok as well. There's a fair bit of leeway in exactly where these replacements happen, so I can just ignore extensions for now. – the4naves May 12 '22 at 23:39
  • 1
    An instruction can use *absolute* addressing to access the bytes of its own machine code. An unlikely silly example from [How many ways to set a register to zero?](https://stackoverflow.com/q/4829937) is `@movzx:` `movzx eax, byte ptr[@movzx + 6]`. But that's a special case of the program checksumming its machine code (e.g. with a pointer in a reg looping over the text section). Extreme code-golf in 16-bit code sometimes looks for usable bytes of machine code to double as constants, but that works with addresses being small known offsets, and is unlikely in 32-bit code. – Peter Cordes May 12 '22 at 23:41
  • @RaymondChen Good point. For the reason mentioned in my previous comment, I should be able ignore `call`s and `syscall`s for now. To be clear; by 'ignore', I mean classify as an invalid replacement point. – the4naves May 12 '22 at 23:43
  • With *Other basic x86-64 instructions aren't* - I meant that none of the baseline x86-64 instructions are a problem, although Raymond points out that indirect `call` ret addres could be a problem, and `syscall` is like `call` in potentially being special (if you can replace with a jmp rel8). I mostly mean that instructions compilers use in user-space should all be fine, except for the "obvious" cases of anything with a rel8 or rel32 like `loop` or `call`. None of the fancy SIMD instructions have relative branches or data, other than the usual `[rip+rel32]`. – Peter Cordes May 12 '22 at 23:47
  • @PeterCordes As to the absolute addressing bit; wow. That's... certainly not something I had considered. I not sure there's really anything I can do about that, short of maybe doing something crazy with memory mapping (kernel mode component that unmaps page of replaced instruction, then handles page faults, detecting whether the access is due to an instruction or control flow?). Just ever so slightly out of scope, but maybe a long-term project idea :). Shouldn't be a problem due to the unlikeliness mentioned, but good to know. – the4naves May 12 '22 at 23:55
  • 2
    There are cases where [the kernel treats certain instructions differently if they are at particular addresses](http://web.archive.org/web/20190819142621/http://discolab.rutgers.edu/classes/cs519/papers/fast-mutex.pdf). (Windows does this for certain complex atomic operations.) Not likely to be a problem in compiler-generated application-level code, but if you're going to be patching operating system code, it could become a problem. – Raymond Chen May 12 '22 at 23:58
  • @RaymondChen From my skim of the paper: On 'typical' OS's (within user code), it sounds like this shouldn't a problem. At least on windows, entering and exiting critical sections are done via OS calls (from my understanding), so behavior should be the same. On others, I assume it would depend on the implementation. If it were possible to query the locations of such sections (assuming they have specific memory ranges), it would be acceptable to pass over the code within them. If not, any differences in behavior *should* (?) be similar to normal variations in runtime (in most cases). – the4naves May 13 '22 at 00:20
  • (cont.) With a kernal space component, I'd imagine it would be possible to mitigate this. Otherwise, as with the bytecode access mentioned earlier, this should be an acceptable risk for now. Anyways, I'll stick to userspace for now to avoid horrors like that. – the4naves May 13 '22 at 00:23
  • I should note that my previous comment refers specifically to the paper mentioned; if the kernel chooses to interpret instruction addresses in some other way, that may not be valid. Also, how should this question be answered? Would anyone like to write one up, compiling the comments here (or simply referring to them)? Should I? Should no answer be written? – the4naves May 13 '22 at 00:25
  • If you want to take the time to write up an answer, collecting the random thoughts Raymond and I left in comments, plus any more you have to add on your own, that's great. I'd be happy to look over it and maybe edit, but I'm not feeling inspired at the moment to write one from scratch myself. It's a reasonable question, and answerable with the specific use-case of hooking functions for profiling to define what you mean, it's worth having an answer. If not for that context, an answer might get bogged down in semantics or terminology. – Peter Cordes May 13 '22 at 01:06
  • 3
    Regarding @RaymondChen's comment about `call`, note in particular that 32-bit position-independent code (`gcc -m32 -fPIC`) uses this technique extensively. Since there is no `rip`-relative addressing in 32-bit mode, in order to access a global variable, code determines its own address with a `call thunk`, where `thunk` does `mov eax, [rsp] ; ret`. Then the appropriate displacement is added to `eax` to find the GOT. [See on godbolt](https://godbolt.org/z/zfEfhz4z6). – Nate Eldredge May 13 '22 at 05:36
  • So if you try to relocate such a `call thunk` instruction, everything will break. Unfortunately this would be hard to fix, because there is no obvious way to distinguish a thunk call from any other call. I guess you can try to decode the instruction at the other end of the call and see if it is of the form `mov reg, [esp]`; note it isn't always the same `reg`. – Nate Eldredge May 13 '22 at 05:38

0 Answers0