Using ndisasm in files of different architectures

Question

I would like to use ndisasm for a huge number of files of different architectures (x86 or x64). I do not know if -b16 would gave me correct outputs for all the files or if I have to specify the correct option for each file, like -b32 or -b64. What I am running right know from the command line:

for file in *; do ndisasm -b16 -07c00h -a -s7c3eh "$file" > "/my-path/$file"; done

Yes, you need to specify the size. No, there is no easy way to auto-detect it for flat binaries. — Jester, Nov 21 '17 at 19:06
Humm, thanks. I was thinking in some "file" command to join with ndisasm. — Eduardo Andrade, Nov 21 '17 at 19:08
`file` does not work with flat binaries (they have no headers). — Jester, Nov 21 '17 at 19:08

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

I'd recommend not using ndisasm unless you really do have flat binaries. It treats the whole file, including metadata, as instructions.

x86 machine code is variable-length, and needs to be decoded from the correct starting address to be "in sync". e.g. if the last couple bytes of metadata decode as the start of a long instruction, that's how ndisasm will decode them. This will consume the first few bytes of what was supposed to be the first instruction(s) of machine code in the object or executable file. After that, the current position may be in the middle of another instruction.

Decoding will often get back into sync fairly quickly and line up with how the instructions will actually execute, but if you're going to run a big batch disassembly you might as well use tools that will do it correctly.

Both of the following disassemblers understand object-file formats and selected a mode based on the file type. (e.g. x86-64 mode for x86-64 ELF or PE-COFF objects / executables).

objdump -drwC -Mintel (from GNU binutils) makes pretty nice output, but it uses GNU .intel_syntax noprefix which is MASM-like. (See the intel-syntax tag wiki for more about MASM-style vs. NASM-style).
Agner Fog's objconv disassembler is quite good, and can disassemble into NASM / YASM syntax, or MASM, or AT&T. Example of using it. The output has all extra info as comments, so you can feed it to an assembler and get a binary similar to what you started with, including different sections.

(But special encodings aren't preserved, e.g. the .plt normally uses push imm32 for padding even with small immediates, but you will get the push imm8 form when NASM assembles push 0x1, because objconv doesn't disassemble it to push strict dword 0x1.) Still, it's very good most of the time, and even puts labels on branch targets so you can easily find the tops of loops.

If some but not all of your binaries are flat, maybe use file to find the ones that aren't and feed them to objconv. For the flat binaries, you'll probably have to try disassembling multiple ways and use human judgement to decide whether the code looks "sane" or not.

One major sign of 32-bit code being disassembled as 16 is when the end of a 32-bit immediate or addressing-mode displacement gets decoded as the start of a new instruction. Often this is an add instruction (opcode 00).

For 64 vs. 32-bit code, one big difference is REX prefixes vs. single-byte dec / inc instructions. If you see weird dec / inc instructions in 32-bit disassembly, it's probably actually 64-bit machine-code. If you see weird REX prefixes (especially when the disassembler says rex add eax, ecx or something to show you there's a useless REX prefix), it was probably a separate inc instruction in 32-bit machine code.

Using ndisasm in files of different architectures

1 Answers1