1

This question sounds odd, I know, but without going into detail on why it might be advantageous to step backwards through x86 instructions (it has to do with malware analysis research):

  • I'm developing an x86-64 emulator from scratch.
  • I'm reading through Intel Software Developer's Manual Volumes 2A, 2B, 2C, and 2D: Instruction Set Reference, A-Z to implement a fetch decode execute cycle

But I have a fundamental question I'm really curious about for my future research and I'd like to ask rather than wait to discover it for myself as I better understand the parsing of the instruction set to determine their components and boundaries.

Normally in an x86-64 instruction fetch task

  • You start off knowing the starting boundary of the instruction to be parsed.
  • You must parse the next bytes and determine boundaries: In other words you must look for each optional field of the instruction and determine the boundaries of each field, as well as which byte the instruction ends on.

What if, instead we wanted for some odd reason to do the same thing in the reverse direction:

  • You start off knowing the location of an instruction boundary.
  • You want to parse the previous bytes and determine boundaries, learning the beginning of the previous instruction.

Now at this point, anyone who knows this subject matter is pointing out that you would not even know if the previous instruction is intended to be executed, simply because a jump instruction could've branched the program to this address. That's a given, I'm aware, I'm still interested in knowing if it's possible to fetch backwards.

Looking at the fetch process, we have the following fields to identify:

enter image description here

Normally we look for the existence of Instruction Prefixes, Opcode, ModR/M, SIB, Displacement, Immediate fields, in that order, given a starting address. What I'm asking is essentially if there's a fundamental reason that we couldn't look for the existence of Immediate fields, Displacement, SIB, ModR/M, Opcode, Instruction Prefixes, in that order, given an ending address.

I have not finished fully understanding the process required to implement the fetching of an instruction, so I'm still naïve on the process. So far, the answer to my question doesn't seem obvious to me. I suspect there may likely be some trait of the structure of the x86-64 instruction set that makes fetching only possible in a forward direction, but I couldn't yet point to a specific reason.

Could someone please confirm whether or not there exists a fundamental reason (aside from the branching problem) that an x86-64 program could not be fetched in reverse order, even if not decoded / executed? This is assuming the program isn't doing ROP, in which case those familiar with the topic know the obvious problem (jumping into the middle of an instruction). Could we at least determine the instruction boundaries fetching backwards? Why or why not?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
J.Todd
  • 707
  • 1
  • 12
  • 34
  • 4
    If you look backwards at a particular byte in memory, and it's ```0x90```, how could you possibly know if it's ```NOP``` or the second half of ```INT 90h```? Between variable length instructions and every byte sequence already used by some opcode or another, it seems almost impossible. – sj95126 Jul 27 '21 at 21:46
  • 3
    Terminology: You can fetch backwards but you can't unambiguously *decode*. – Peter Cordes Jul 27 '21 at 21:49
  • Also related: code golf answer: [shortest byte sequence that will definitely fault if executed](https://codegolf.stackexchange.com/questions/133486/find-an-illegal-string/133622#133622) - the possibility of an 8-byte immediate, or disp32+imm32, means we need at least 9 NOPs to make sure that previous bytes couldn't consume the whole thing as part of an immediate. – Peter Cordes Jul 27 '21 at 22:16
  • @PeterCordes could there be any practical use for that in exploitation or malware analysis? As in maybe you could insert it at any point in an executable and guarantee an error is thrown no matter the context? But I can't think of a reason to do that rather than using a debugger. – J.Todd Jul 27 '21 at 23:32
  • 1
    I don't think so; I mention it as an exercise in thinking about how x86 machine code can be ambiguous / is not self-synchronizing. – Peter Cordes Jul 27 '21 at 23:33
  • @sj95126 Yeah I figured as much but some part of me was curious whether we're collectively sure it's not "fundamentally" possible. I believe I recall in a Chris Domas video that many byte sequences are invalid as instructions, so I wasn't prepared to rule out that some brilliant person couldn't devise some method of multi-stepping backwards and starting over from the origin if an invalid byte sequence was encountered. Idk, maybe you'd statistically encounter a bad byte sequence eventually if you stepped backwards enough. But finding the correct boundary would still seem nearly impossible. – J.Todd Jul 29 '21 at 01:25
  • @sj95126 Even if unlikely, it's an enticing idea because a big problem in malware analysis is evasion where the program avoids executing the malicious branch of code when it detects an emulator or VM or non-target environment (fairly trivial to do) so you want to force emulation down the alternative branches but it's statistically non-feasible to test all branching outcomes in a program of significant size. So my thought was to look for all ops that might be a system call and analyze only the branches that lead to I/O events, avoiding dead code. RISC perhaps has security advantages in that way – J.Todd Jul 29 '21 at 01:30
  • @J.Todd I believe the fundamental problem with stepping backwards is jumps. You don't know whether the preceding code was executed. The current instruction could have been jumped into and some preceding bytes could even be completely unreachable by any path through the program. – Petr Skocik Dec 20 '22 at 12:15

0 Answers0