I'd recommend not using ndisasm
unless you really do have flat binaries. It treats the whole file, including metadata, as instructions.
x86 machine code is variable-length, and needs to be decoded from the correct starting address to be "in sync". e.g. if the last couple bytes of metadata decode as the start of a long instruction, that's how ndisasm
will decode them. This will consume the first few bytes of what was supposed to be the first instruction(s) of machine code in the object or executable file. After that, the current position may be in the middle of another instruction.
Decoding will often get back into sync fairly quickly and line up with how the instructions will actually execute, but if you're going to run a big batch disassembly you might as well use tools that will do it correctly.
Both of the following disassemblers understand object-file formats and selected a mode based on the file type. (e.g. x86-64 mode for x86-64 ELF or PE-COFF objects / executables).
objdump -drwC -Mintel
(from GNU binutils) makes pretty nice output, but it uses GNU .intel_syntax noprefix
which is MASM-like. (See the intel-syntax tag wiki for more about MASM-style vs. NASM-style).
Agner Fog's objconv
disassembler is quite good, and can disassemble into NASM / YASM syntax, or MASM, or AT&T. Example of using it. The output has all extra info as comments, so you can feed it to an assembler and get a binary similar to what you started with, including different sections.
(But special encodings aren't preserved, e.g. the .plt
normally uses push imm32
for padding even with small immediates, but you will get the push imm8
form when NASM assembles push 0x1
, because objconv
doesn't disassemble it to push strict dword 0x1
.) Still, it's very good most of the time, and even puts labels on branch targets so you can easily find the tops of loops.
If some but not all of your binaries are flat, maybe use file
to find the ones that aren't and feed them to objconv
. For the flat binaries, you'll probably have to try disassembling multiple ways and use human judgement to decide whether the code looks "sane" or not.
One major sign of 32-bit code being disassembled as 16 is when the end of a 32-bit immediate or addressing-mode displacement gets decoded as the start of a new instruction. Often this is an add
instruction (opcode 00
).
For 64 vs. 32-bit code, one big difference is REX prefixes vs. single-byte dec
/ inc
instructions. If you see weird dec
/ inc
instructions in 32-bit disassembly, it's probably actually 64-bit machine-code. If you see weird REX prefixes (especially when the disassembler says rex add eax, ecx
or something to show you there's a useless REX prefix), it was probably a separate inc
instruction in 32-bit machine code.