This question sounds odd, I know, but without going into detail on why it might be advantageous to step backwards through x86 instructions (it has to do with malware analysis research):
- I'm developing an x86-64 emulator from scratch.
- I'm reading through Intel Software Developer's Manual Volumes 2A, 2B, 2C, and 2D: Instruction Set Reference, A-Z to implement a fetch decode execute cycle
But I have a fundamental question I'm really curious about for my future research and I'd like to ask rather than wait to discover it for myself as I better understand the parsing of the instruction set to determine their components and boundaries.
Normally in an x86-64 instruction fetch task
- You start off knowing the starting boundary of the instruction to be parsed.
- You must parse the next bytes and determine boundaries: In other words you must look for each optional field of the instruction and determine the boundaries of each field, as well as which byte the instruction ends on.
What if, instead we wanted for some odd reason to do the same thing in the reverse direction:
- You start off knowing the location of an instruction boundary.
- You want to parse the previous bytes and determine boundaries, learning the beginning of the previous instruction.
Now at this point, anyone who knows this subject matter is pointing out that you would not even know if the previous instruction is intended to be executed, simply because a jump instruction could've branched the program to this address. That's a given, I'm aware, I'm still interested in knowing if it's possible to fetch backwards.
Looking at the fetch process, we have the following fields to identify:
Normally we look for the existence of Instruction Prefixes, Opcode, ModR/M, SIB, Displacement, Immediate fields, in that order, given a starting address. What I'm asking is essentially if there's a fundamental reason that we couldn't look for the existence of Immediate fields, Displacement, SIB, ModR/M, Opcode, Instruction Prefixes, in that order, given an ending address.
I have not finished fully understanding the process required to implement the fetching of an instruction, so I'm still naïve on the process. So far, the answer to my question doesn't seem obvious to me. I suspect there may likely be some trait of the structure of the x86-64 instruction set that makes fetching only possible in a forward direction, but I couldn't yet point to a specific reason.
Could someone please confirm whether or not there exists a fundamental reason (aside from the branching problem) that an x86-64 program could not be fetched in reverse order, even if not decoded / executed? This is assuming the program isn't doing ROP, in which case those familiar with the topic know the obvious problem (jumping into the middle of an instruction). Could we at least determine the instruction boundaries fetching backwards? Why or why not?