"eventually" print the correct disassembly code?
Nitpick: this already is correct disassembly for that starting point. It's what the CPU would execute if you jumped there. x86 machine code is a byte stream which is not self-synchronizing (unlike UTF-8), nothing that marks a byte as not being the start of an instruction. This is the same reason why GDB won't let you scroll backwards in layout asm
TUI mode, if it doesn't have a nearby symbol to disassemble from.
But what you're really asking, yes, typically decoding will get back in sync with the instruction boundaries the compiler intended in a few instructions, in a non-obfuscated executable where objdump works in the first place1. Many byte sequences form short instructions, and there are quite a few 1-byte opcodes, some of which you find as immediates or ModRM/SIB bytes inside other instructions. (e.g. 00 00
is add [rax], al
.) In x86-64, some bytes are currently undefined as opcodes, so disassemblers often disassemble as .byte xyz
and consider the next byte as a possible instruction start. In 32-bit mode, most of those are 1-byte instructions, a few of them 2 bytes (push/pop segment regs, and BCD instructions like AAA or AAM).
Hand-crafted sequences of machine code might sustain different decoding for longer, overlaying two different instruction sequences in the same bytes of memory.
The same block of bytes decoding different ways from different start points is sometimes still used as an obfuscation technique. e.g. jump forwards 1 byte, or jump backwards into something that already ran a different way. Often one of the executions isn't really useful, just there to confuse disassemblers. (It's bad for performance, especially on CPUs that mark instruction boundaries in L1i cache, and in CPUs with a uop cache.)
In extreme code-size optimization (e.g. code-golf or demo-scene), stuff like skipping 4 bytes on entry to the first iteration of a loop can be done with a 1-byte opcode for the start of a test eax, imm32
, as in Tips for golfing in x86/x64 machine code
Related:
Footnote 1: More sophisticated disassemblers designed to handle potentially-obfuscated binaries will start at some entry point, and follow direct jumps to find other start points to disassemble from, hoping to cover all bytes of the .text
section from valid starting points for execution. Indirect jumps and never-taken conditional branches can still fool them.
GCC and clang make x86 executable that are trivial to disassemble, in GCC's case from literally printing asm text mixed with .p2align
directives and assembling it. I've heard that MSVC sometime was/is slightly less trivial, but I forget the details. These days MSVC just uses int3
(0xcc) padding between functions.
See also Why do Compilers put data inside .text(code) section of the PE and ELF files and how does the CPU distinguish between data and code? - they don't, see my answer for reasons why there's no benefit on x86. .rodata
might be contiguous with .text
, but is grouped separately. A proposed binary randomizer needs to handle that because obfuscated executables might do it, not because of normal compiler output.
Other ISAs
ARM "literal pools" do stuff data in between functions (because there's a PC-relative addressing mode with a limited displacement so it's useful for constants). With Thumb mode variable-length instructions there can potentially be a tiny bit of ambiguity about disassembly, if the data happened to have the bit set that signals it being the first of two 16-bit chunks of a 32-bit instruction, but it came right before the start of a function.
Most modern ISAs with variable-length instructions are better designed than x86, being much easier to decode, by having a consistent way for an initial chunk to signal that it's the start of a longer instruction. (e.g. 1 bit like Thumb, or I think multiple bits for RV32C or Agner Fog's on-paper ISA ForwardCom, if I'm remembering this correctly.)
So it's easy to get back in sync, but the very first decode might still be "weird" if you start at the 2nd or later chunk of a longer instruction. Unlike UTF-8, machine code wouldn't spend a bit per chunk to signal that it's a continuation, not the start of a new instruction. Searching and seeking in UTF-8 text is very important, but machine code is normally just executed, hence the different design choice.