What happens if `objdump -d --start-address` starts printing from the middle of an x86 instruction?

Question

... in other words, is x86-64 a uniquely decodable code that, no matter where I start decoding it, always eventually yields the correct disassembly code?

Let's say this (the ground truth) is what part of an ELF file looks like. The first instruction is six bytes long, spanning from 0x5de5a2 to 0x5de5a7:

  ...
  5de5a2:       0f 84 d0 00 00 00       je     5de678 <_ZN16DefNewGeneration22copy_to_survivor_spaceEP7oopDesc.constprop.0+0x138>
  5de5a8:       48 c1 e8 03             shr    rax,0x3
  5de5ac:       83 e0 0f                and    eax,0xf
  ...

If I do objdump -d --start-address=0x5de5a2, the output is indeed correct.

What happens if I intentionally pass in a start address that would "split" the first instruction, for example, let start-address=0x5de524?

This is what I get with:objdump -d --start-adress=0x5de5a4:

  ...
  5de5a4:       d0 00                   rol    BYTE PTR [rax],1
  5de5a6:       00 00                   add    BYTE PTR [rax],al
  5de5a8:       48 c1 e8 03             shr    rax,0x3
  5de5ac:       83 e0 0f                and    eax,0xf
  ...

In this specific case, the remnants of 0x5de5a2: je 5de678 are disassembled into junk, but luckily all code beyond 0x56e5a8 are still correctly decoded.

What I want to ask is, is this a property that I can always rely on? Can I let objdump start disassembling an x86 ELF file at arbitrary start-address in the text section, even if the start address would split an legitimate instruction, but expect objdump to "eventually" print the correct disassembly code?

Extra question: How far can a misaligned start-address impact the correctness of the disassembly? Does this property also hold for other ISAs?

score 3 · Accepted Answer · answered Jan 27 '23 at 20:41

X86 code is not self-synchronizing, if you start a disassembly at an address that is not the proper start of an instruction you may or may not get good code back after some number of bad instructions, but this is not something you can count on).

In the early days of x86 it was even known for people to write code that depended on the non-self-synchronizing nature of x86 code, there would be jmp instructions that deliberately hit the middle of an instruction and the like.

There are ISAs that are self-synchronizing, particularly machines that have fixed-length instructions are self-synchronizing pretty much by definition. I am not aware of machines that have both variable length instructions the way x86 does and are self-synchronizing, but that is a question I am by no means an expert on.

The same block of bytes decoding different ways from different start points is sometimes still used as an obfuscation technique. It's bad for performance, especially on CPUs that mark instruction boundaries in L1i cache, and in CPUs with a uop cache. In extreme code-size optimization (e.g. code-golf or demo-scene), skipping 4 bytes on entry to the first iteration of a loop can be done with a 1-byte opcode for the start of a `test eax, imm32`, as in [Tips for golfing in x86/x64 machine code](https://codegolf.stackexchange.com/a/235553) — Peter Cordes, Jan 27 '23 at 20:44

score 1 · Answer 2 · answered Jan 27 '23 at 21:33

"eventually" print the correct disassembly code?

Nitpick: this already is correct disassembly for that starting point. It's what the CPU would execute if you jumped there. x86 machine code is a byte stream which is not self-synchronizing (unlike UTF-8), nothing that marks a byte as not being the start of an instruction. This is the same reason why GDB won't let you scroll backwards in layout asm TUI mode, if it doesn't have a nearby symbol to disassemble from.

But what you're really asking, yes, typically decoding will get back in sync with the instruction boundaries the compiler intended in a few instructions, in a non-obfuscated executable where objdump works in the first place¹. Many byte sequences form short instructions, and there are quite a few 1-byte opcodes, some of which you find as immediates or ModRM/SIB bytes inside other instructions. (e.g. 00 00 is add [rax], al.) In x86-64, some bytes are currently undefined as opcodes, so disassemblers often disassemble as .byte xyz and consider the next byte as a possible instruction start. In 32-bit mode, most of those are 1-byte instructions, a few of them 2 bytes (push/pop segment regs, and BCD instructions like AAA or AAM).

Hand-crafted sequences of machine code might sustain different decoding for longer, overlaying two different instruction sequences in the same bytes of memory.

The same block of bytes decoding different ways from different start points is sometimes still used as an obfuscation technique. e.g. jump forwards 1 byte, or jump backwards into something that already ran a different way. Often one of the executions isn't really useful, just there to confuse disassemblers. (It's bad for performance, especially on CPUs that mark instruction boundaries in L1i cache, and in CPUs with a uop cache.)

In extreme code-size optimization (e.g. code-golf or demo-scene), stuff like skipping 4 bytes on entry to the first iteration of a loop can be done with a 1-byte opcode for the start of a test eax, imm32, as in Tips for golfing in x86/x64 machine code

How does an instruction decoder tell the difference between a prefix and a primary opcode?
How does the CPU know how many bytes it should read for the next instruction, considering instructions have different lengths?
Is it possible to decode x86-64 instructions in reverse? (not unambiguously.)
Are more byte sequences valid than not when interpreted by an x86 CPU? - see discussion in comments on basically the same topic as what you're asking about.

Footnote 1: More sophisticated disassemblers designed to handle potentially-obfuscated binaries will start at some entry point, and follow direct jumps to find other start points to disassemble from, hoping to cover all bytes of the .text section from valid starting points for execution. Indirect jumps and never-taken conditional branches can still fool them.

GCC and clang make x86 executable that are trivial to disassemble, in GCC's case from literally printing asm text mixed with .p2align directives and assembling it. I've heard that MSVC sometime was/is slightly less trivial, but I forget the details. These days MSVC just uses int3 (0xcc) padding between functions.

See also Why do Compilers put data inside .text(code) section of the PE and ELF files and how does the CPU distinguish between data and code? - they don't, see my answer for reasons why there's no benefit on x86. .rodata might be contiguous with .text, but is grouped separately. A proposed binary randomizer needs to handle that because obfuscated executables might do it, not because of normal compiler output.

Other ISAs

ARM "literal pools" do stuff data in between functions (because there's a PC-relative addressing mode with a limited displacement so it's useful for constants). With Thumb mode variable-length instructions there can potentially be a tiny bit of ambiguity about disassembly, if the data happened to have the bit set that signals it being the first of two 16-bit chunks of a 32-bit instruction, but it came right before the start of a function.

Most modern ISAs with variable-length instructions are better designed than x86, being much easier to decode, by having a consistent way for an initial chunk to signal that it's the start of a longer instruction. (e.g. 1 bit like Thumb, or I think multiple bits for RV32C or Agner Fog's on-paper ISA ForwardCom, if I'm remembering this correctly.)

So it's easy to get back in sync, but the very first decode might still be "weird" if you start at the 2nd or later chunk of a longer instruction. Unlike UTF-8, machine code wouldn't spend a bit per chunk to signal that it's a continuation, not the start of a new instruction. Searching and seeking in UTF-8 text is very important, but machine code is normally just executed, hence the different design choice.

Wow, thanks for this very detailed answer! So, I guess the takeaway is that theoretically there is no guarantee the decoding will ever come back in sync with what the compiler intended; but with common compilers like GCC using fillers to ensure functions start at aligned addresses, `objdump` should come back in sync at function boundaries, right (unless the previous function happens to end at exactly the boundary)? — fjs, Jan 27 '23 at 22:37
@fjs: Often it'll re-sync sooner. MSVC pads between functions with multiple `CC int3` instructions. GCC and clang use long NOPs, not multiple `90 nop` (for GCC because it just uses `.p2align` without bothering to override the filler byte, because there's no reason to). So it probably helps less than usual. The short `ret` and `push` instructions common at the end of one function and the start of the next are more helpful. But yeah, 2-byte NOPs with `66 90` can re-sync whichever byte you start on, and long nops using a few prefixes can also work as a nop sled/slide. Have a look at ` — Peter Cordes, Jan 27 '23 at 22:51
@fjs: The long NOPs often do end with `00 00` bytes, though. As a ModRM, `00` is `[rax],al`, so even if the previous byte was an opcode, that ModRM won't imply a long disp32. So yeah, if you look at disassembly starting at various offsets near the end of a function, you might well find that it's common to re-sync, even if it's only a few bytes of padding. (Any amount of padding from 0 to 15 bytes is possible between functions with the default behaviour of aligning them by 16). See also [most following bytes an opcode+ModRM can consume](https://codegolf.stackexchange.com/a/133622) — Peter Cordes, Jan 27 '23 at 22:56

What happens if `objdump -d --start-address` starts printing from the middle of an x86 instruction?

2 Answers2

Other ISAs

Linked