What happens to this instruction: mov edi, dword ptr [ebx + offset32]

Question

This a 32-bit Windows program crashes randomly. I debugged it with Visual Studio 2019. Here goes what I see.

Before execution,

After execution by clicking "Step Into":

It seems that the CPU breaks the instruction into 3 parts: 8B BB, 4C, F6 07 00. What is 8B BB? I confirm that the address [ebx +7F64C] is valid and accessible.

Editing: If I click "Step Over", EDI is not changed as expected. Add screenshots of registers.

To be closed: I realized that this problem is specific to the debugger that the breakpoint is in the middle of the instruction. The random crash is not related to this anyway.

UPDATE: The problem is caused by Visual Studio debugger. The debugger does not actually execute the code but emulate it. When a breakpoint is set in the middle of an instruction, it just gets confused and interprets the misaligned instruction.

88 BB is the the opcode and modrm for `mov edi, [ebx + disp32]`, just like your disassembler is showing you in the first view. Looks like you have a bug that resulted in jumping into the middle of an instruction. x86 machine code is a byte stream that's not self-synchronizing, decoding *can* start anywhere, but it's usually only useful to start at the boundaries of instructions the compiler intended. (Decoding backwards isn't possible without ambiguity, hence the ?? I guess) — Peter Cordes, Dec 15 '22 at 03:35
It's the same bytes as before. It's just that you seem to have jumped into the middle of an instruction, so the debugger disassembles starting at the program counter. — Nate Eldredge, Dec 15 '22 at 03:36
See [Is it possible to decode x86-64 instructions in reverse?](https://stackoverflow.com/q/52415761) and the various links in phuchlv's answer. — Peter Cordes, Dec 15 '22 at 03:37
@NateEldredge Thanks for the quick response. I just clicked "Step Into" but not anything else. Why does the CPU break the instruction? — albert, Dec 15 '22 at 03:54
@PeterCordes Thanks! I went through the post and I don't see a problem of instruction decoding. I added screenshots for the registers of before and after execution. Any idea? — albert, Dec 15 '22 at 04:08
Normally you would only use "step into" when at a call instruction. I don't know if it is supposed to work otherwise, you'd have to check the documentation. — Nate Eldredge, Dec 15 '22 at 04:15
Hi Nate and Peter, Thanks for your comments! I realized that this problem is specific to the debugger that the breakpoint is in the middle of the instruction. The random crash is not related to this anyway. — albert, Dec 15 '22 at 11:48
I'm pretty sure Visual Studio's debugger does let the CPU execute an instruction. That's separate from disassembly; disassembly is always a software thing. And yeah, it makes sense that it would assume a breakpoint is the start of an instruction, since anti-debugging obfuscation techniques can throw off disassembly. So this debugger behaviour is desirable; don't set breakpoints in the middle of instructions: they won't work there and will mess up execution of the instruction that contains it. — Peter Cordes, Mar 14 '23 at 00:52

puppydrum64 · Answer 1 · 2022-12-20T13:24:37.390

As Peter Cordes explained, it's possible on the x86 architecture to jump into the middle of an instruction. The %rip register doesn't take into account what is behind it when decoding the instructions, it only looks at where it is now and going forward. As a result, depending on where you start looking, the code can be completely different. The bytes themselves haven't changed at all, it's just the interpretation of them.

Funnily enough, this can be abused on many CISC architectures to do things more efficiently. As an example, I'll share some code in a game I'm writing for DOSBOX:

cmp al,0C0h
jc IsOneByteOperand
    lodsw
    byte 0A8h   ;opcode for "test al, imm8", eats the "lodsb" below.
IsOneByteOperand:
    lodsb       ;only executed if we branched here. Otherwise skipped.
    mov dx,bx   ;program continues as normal from here on out.

When the IsOneByteOperand branch is taken, the CPU executes lodsb, and everything looks like it does to the reader as it does to the CPU. However, if the branch is not taken, this is what the CPU sees:

lodsw
test al,0ABh   ;0ABh is the opcode for "lodsb". We don't care about the flags.
mov dx,bx

If you don't care about modifying the flags, you can use this to skip over an instruction with only one branch instead of two, while using fewer bytes to encode it (a relative JMP would take one byte for the JMP itself and one more for the relative offset, whereas this method uses one byte for the TEST instruction (treating the lodsb as its "operand")

EDIT: Used an example from x86 Assembly rather than 6502 since the topic was x86 Assembly.

[Tips for golfing in x86/x64 machine code](https://codegolf.stackexchange.com/a/235553) has some x86 examples of this, also Ira Baxter's answer on [Can assembled ASM code result in more than a single possible way (except for offset values)?](https://stackoverflow.com/a/10766612) — Peter Cordes, Dec 15 '22 at 16:48

What happens to this instruction: mov edi, dword ptr [ebx + offset32]

1 Answers1