Intel's own PDF manuals document this in detail; see vol.2 of the SDM, specifically the intro chapters before the entries for each instruction.
There are also detailed descriptions on various sites like https://wiki.osdev.org/X86-64_Instruction_Encoding#ModR.2FM_and_SIB_bytes (which covers 16-bit ModRM, so it's not just talking about x86-64 long mode.) Modern x86 uses the same instruction encoding (in 16-bit real mode) as 8086; that backwards compat is the whole point of x86, and why it's so nasty.
And of course you can get find PDF copies of the actual 8086 manual itself, in case that's more helpful to omit stuff that's only relevant for other modes.
The 8086 primer from page 23 onwards covers instruction encoding of operands. It's written as a book, not just a technical manual. It's available for free on Stephen Morse's web site (https://stevemorse.org/8086/), the guy who designed it when he was at Intel.
But maybe it would help to describe the basic overview of the purpose of ModRM, so you know what to look for in those docs.
ModR/M purpose and basics
Most (but not all) x86 instructions have one ModRM byte. It can code for 2 operands, up to one of them being memory, or both registers. e.g. add cx, ax
, or add cx, [bx+si]
.
The opcode itself determines which of the r/m and r operands are the source and/or destination, or whether the /r
field acts as extra opcode bits. (e.g. for shifts, that's why they can't copy-and-shift, or use a count register other than CL.) add [bx+si], cx
has the same ModRM byte as add cx, [bx+si]
but a different opcode.
The register-only operand is code by the 3-bit /r field. 3 bits can code for any of x86's 8 general-purpose registers. This is a "register number", like in any normal ISA with 2^n registers, groups of n bits in each instruction code for register operands.
The r/m
operand can also be a register, but the 2-bit "mode" field determines whether the 3-bit r/m field is a register number (mod=0b11) or whether it's a memory addressing mode. (Plus an 8 or 16-bit displacement, so coding for a disp0/8/16 uses up the other 3 encodings of the mode field.)
https://wiki.osdev.org/X86-64_Instruction_Encoding#ModR.2FM_and_SIB_bytes shows the fields and interpretation for 16-bit address-size, including register numbers.
So there are only 3 bits to specify a register or combination of registers for the memory address. 386 added an escape code for a SIB byte, allowing a full selection of addressing modes like [eax + ecx*4]
, but 8086 (and 16-bit address-size on any CPU) must be some subset of [BX|BP] + [SI|DI] + disp0/8/16
.
See Differences between general purpose registers in 8086: [bx] works, [cx] doesn't? / Why don't x86 16-bit addressing modes have a scale factor, while the 32-bit version has it?
Examples from assembling foo.asm
and then ndisasm -b16 foo
, or from asking NASM itself to make a listing with nasm -l/dev/stdout foo.asm
. Then editing to simplify the output fields.
00 00 add [bx+si],al ; opcode=0x00 (add byte, mem dst) mod=00 r=000 r/m=000
01 C0 add ax, ax ; add r/m, r mod=11 (register) r=000 (AX) r/m=0 (AX)
01 08 add [bx+si], cx ; add r/m, r
03 08 add cx, [bx+si] ; mod=0, r=001 (CX) r/m=000 ([bx+si])
03 0F add cx, [bx] ; mod=00 r=001 (CX) r/m=111 ([BX])
03 4F 04 add cx, [bx + 4] ; mod=01 r=001 (CX) r/m=111 disp8=4
01 F2 add dx, si ; mod=11 r=110 (SI) r/m=010 (DX)
To create more examples, use an assembler to create machine code yourself.
See also