Opcode vs Operand in x86 assembly source code

Question

Recently in an exam, when asked about opcode vs operand, I gave an example

mov [ax],0000h

where I said the mov was the opcode and [ax],0000h was the operand and together they formed an instruction. My instructor gave me a 0 on the question and said that [ax] was the opcode and 0000h was the operand only.

In my textbook it says that in a MOV instruction the mov is the opcode and the source and destination are often called operands.

I wish to go to my instructor with the textbook and ask again, but before I do, can someone clear this up for me so I am not going in with any wrong understanding?

Tried to write correct answer, got 0 lol.

`[ax]` would be an operand and `0000h` another operand. `mov` is the instruction's mnemonic. However, there is no possible way to encode a memory operand with the address as `ax`. Also, your example is lacking a size keyword (eg `word` or `word ptr`) on the destination operand. The opcode is the machine code representation of the instruction. Opcode can include the entire instruction's machine code or only the byte or bytes that select the instruction (possibly excluding prefixes, ModR/M, SIB, offset, immediate). — ecm, Nov 24 '22 at 08:38

Peter Cordes · Accepted Answer · 2022-11-27T03:26:59.943

First of all, mov [ax], 0000h can't be represented in 8086 machine code. There's no binary representation for that destination addressing mode.

TL:DR: mov is the mnemonic, [ax] is the destination operand, 0000h is the source operand. There is no binary "opcode" because the instruction is not encodeable. But if you're misusing "opcode" to talk about parts of the source line, you'd normally say that mov is the opcode.

Opcodes are a feature of machine code, not assembly source code. Perhaps they're bending the terminology to talk about the instruction name, or they intended to talk about how it will assemble into machine code.

In the asm source code mov [ax],0000h:

mov is the mnemonic, which says what instruction it is. This means the machine code will be using a mnemonic that's one of the few listed in the manual for that mnemonic (https://www.felixcloutier.com/x86/mov), with the assembler's choice depending on the operands.

In this case a memory destination and an immediate source, but size not specified or implied by either, so could be C6 /0 ib MOV r/m8, imm8 or C7 /0 iw MOV r/m16, imm16. emu8086 is a bad assembler that doesn't warn you about the ambiguity in some cases, but might here where the value is zero.
[ax] is the destination operand. This is not encodeable in x86 machine code; it's not one of the few valid 16-bit invalid addressing modes.
0000h is the source operand. Most instructions have an opcode that allows an immediate source.

Unlike some earlier 8-bit machine, like 8080 that influenced some 8086 design decisions, both operands are explicit for most instructions, not just implied by an opcode. (Later extensions to x86 include some instructions with more than 2 operands, but x86 is still mostly a 2-operand ISA.)

For comparison, see an 8080 opcode map https://pastraiser.com/cpu/i8080/i8080_opcodes.html
vs. an 8086 opcode map like this, or a table like this. (Or a modern x86 32-bit mode opcode table, http://ref.x86asm.net/coder32.html which is the most nicely formatted and readable.) Note that in the 8080 map, each entry has at least a destination or both operands implied just by the opcode byte. But in 8086, usually just the mnemonic, with the operands encoded separately.

So there's no combination of opcode and ModRM byte that can represent this instruction as a sequence of bytes of machine code.

See How to tell the length of an x86 instruction? for a diagram summarizing the format of x86 machine code. (8086 didn't allow a SIB byte, hence the more limited addressing modes, but all other optional parts are still applicable. 8086 only has 1-byte opcodes, never 2 or 3, and of course immediates and displacements are at most 2 bytes.)

If it was `mov word ptr [si], 0000h`, the machine code would be

         c7     04       00 00 
         ^      ^        ^
       opcode  ModR/M   imm16 immediate operand

The destination operand, [si] is encoded by the ModRM byte, using the 2 bit "mode" field (0) that specifies a memory addressing mode with no displacement (since it's not [si + 16] or something), and the 3-bit "r/m" field that specifies just si. See the table in https://wiki.osdev.org/X86-64_Instruction_Encoding#16-bit_addressing or in Intel or AMD's manuals.

The opcode is the c7 byte and the 3-bit /r field of the ModRM byte (with value 0). See How to read the Intel Opcode notation for details on how this works, borrowing extra bits from ModRM as extra opcode bits. (That's why we have instructions like add ax, 123, not add cx, [si], 123 with a write-only destination and two separate sources including the immediate implied by the opcode, since ModRM can normally encode two operands as in add cx, [si]. Only the new 186 form of imul cx, [si], 123 allows that. Similarly neg dx instead of neg cx, dx)

If it was `mov ax, 0000h`

   b8          00 00
    ^          ^
  Opcode       imm16 immediate source

The AX destination is specified by the low 3 bits of the leading byte. You could look at this as 8 different opcode bytes, one for each register, with an implicit destination. That interpretation (of this different instruction, not the impossible one in your assignment) would sort of match up with your instructor's description of "mov-to-AX" as the opcode.

But really you'd say mov ax, imm16 was the opcode, with the actual value to fill in the placeholder being the 0 operand. There are three other opcodes that can mov to AX:

8B /r mov r16, r/m16 (example: mov ax, [si])
89 /r mov r/m16, r16 (example: mov ax, si)
A1 mov ax, moffs (e.g. mov ax, [1234h]). Special case no-ModRM short-form with an absolute offset and an AL or AX destination.
And a 4th that wouldn't normally get used with a register destination because it's longer: C7 /0 iw mov r/m16, imm16 (e.g. a longer encoding of mov ax, 0).
Also 8C /r mov r/m16, Sreg (e.g. mov ax, ds).
Modern x86 has a few more forms, like mov r/m16, cr0..7 (new in 386) and mov r/m16, dr0..7 (386), but control registers didn't exist(?) until 286 smsw (store machine status word).

Or you could look at it the way Intel's manual documents it, as B8+ rw iw being the encoding for MOV r16, imm16. So the opcode is the high 5 bits of the first byte, the destination register number is the low 3 bits of that byte. As with the memory destination form, the opcode itself implied the presence of a 16-bit immediate as the source operand.

There's no ModR/M byte; the purpose of these short-form encodings was to save space for common instructions in 8086. There are similar no-modrm short forms, like xchg-with-AX which is where 90h nop comes from, xchg ax,ax. And for inc/dec of a full register. There are also no-ModRM short-forms for most ALU operations with the accumulator, e.g. add al, 123 is 2 bytes, vs. add bl, 123 is 3 bytes. (See code golf tips for x86 machine code).

Note that mov ax, 0 is also encodeable with a 4-byte encoding, using the same mov r/m16, imm16 encoding, with a ModRM byte encoding the ax register as the destination. Assemblers normally choose the shortest possible encoding when there's a choice. (In some cases there are two choices the same length, like add cx, dx: see x86 XOR opcode differences)

BTW, feel free to link this answer to your instructor. I hope that clears things up for both of you. — Peter Cordes, Nov 24 '22 at 11:58
Thank you very much for your detailed answer. I also went a little further into the textbook and I think the professor and you are correct as mov ax can be interpreted as the opcode... I will read up more details about this topic and the links you gave me. Its a good thing I didn't go embarrass myself in front of the professor and asked here first :p — RandyRathore, Nov 25 '22 at 09:53
@RandyRathore: Oh, was the actual instruction `mov ax, 0`, not `mov [ax], 0`? I wondered. Even then, I *don't* agree that it makes sense to say that `mov ax` is an opcode. That would be a big stretch and confusing use of terminology, mixing up asm source with binary machine code. There is a binary opcode byte for `mov ax, imm16`, so *that* can be an opcode, but `mov ax, cx` or `mov ax, [si]` are different binary opcodes that also start with `mov ax`. (And those use the same `mov r16, r/m16` opcode as instructions like `mov cx, [di]` or `mov bx, bp`.) — Peter Cordes, Nov 25 '22 at 11:04
@RandyRathore: So there isn't a good justification for saying `mov ax` is an opcode. **When I want to use asm source to describe the `B8` opcode, I write `mov ax, imm16`.** (Like sites such as https://uops.info/table.html use do show which form of an instruction is being talked about.) In the machine code, the AX destination operand is implicit, implied by the opcode (if you treat the whole leading byte as the opcode, not as having a 3-bit register-number field and 5-bit opcode). The imm16 source operand is explicit, needing a value to fill in the placeholder. — Peter Cordes, Nov 25 '22 at 11:07
no, the actual code was the one mentioned in the question. I know I cannot use ax as the offset address in indirect addressing mode. But I was just trying to give an example of what part is considered opcode and what part is operand in an instruction, so I did not think about the actual validity much. But still if my professor counted that as the mistake I would not have been surprised. But when he said that mov [ax] part is opcode, and only the source counts as operand, that really created some serious confusions in me. — RandyRathore, Nov 25 '22 at 21:09
@RandyRathore: Well that doesn't sound good. I can't agree at all with saying that `mov [ax]` is an opcode. Even for a valid instruction like `mov word [si], 0`, it doesn't make sense to say `mov word [si]` is an opcode, especially if you're omitting `word [si]` from being one of the operands. You could say `mov word [si], imm16` is the opcode+modrm. Again, feel free to link this answer to your professor. This specific claim is a mistake in your course material, besides being an invalid instruction. — Peter Cordes, Nov 25 '22 at 22:15
I will show him the answer, Thank you very much. Even without the confusion, your answer has helped me know many more interesting things about the language. — RandyRathore, Nov 25 '22 at 22:32
@RandyRathore: Updated my answer with some of that stuff from comments - there are 5 different opcodes on 8086 that can be used for instructions whose asm source starts with `mov ax, ...`. I listed them with examples. — Peter Cordes, Nov 27 '22 at 03:29
Thank you @Peter Cordes. Update: I went to my professor again. Now he says anything that has to go through screening by the processor is an opcode, and anything that does not is operand. When asked what he means by screening, he says "In easy words, anything that has to be translated into machine language is opcode". His logic is, MOV command and AX or any other register name that has to be transformed into binary code is opcode, and 0000h isn't because it is a number itself and doesn't have to be transformed, But I really did not find any example like his anywhere else. At least, not yet. — RandyRathore, Nov 28 '22 at 21:52
@RandyRathore: That's not normal terminology. Register numbers are "addresses" in standard computer-science terminology, just in a different address-space than memory. So `mov word [si], 1234h` is not fundamentally different from `mov [1234h], 1234h`. I wonder what he'd say about ARM or AArch64 machine code where immediates aren't simple binary but also encoded, e.g. as 8 bits and a rotate count, or as a bit-range (low and high bit-number) and a repeat width. For example, `orr x1, x2, #0xaaaaaaaaaaaaaaaa` has machine code `41 f0 01 b2`, with the 64-bit immediate encoded as a repeating 2-bit — Peter Cordes, Nov 28 '22 at 22:00
@RandyRathore: In x86, immediate values are just used directly, but they do need to get sign-extended if they're narrower than the operand-size. e.g. `83 C1 FF` is the machine code for `add cx, 0xffff` (aka `add cx, -1`). 2's complement sign-extension is a pretty trivial thing to "decode", but regardless I don't think this is a useful distinction. So according to your professor, `add ax, cx` (`01 C8`) has zero operands, while `add word ptr [si+16], 1` has two (`83 44 10 01`)? — Peter Cordes, Nov 28 '22 at 22:06
@RandyRathore: What they're describing are immediate values (including displacements in addressing modes), not "operands" in the usual meaning of the term. There's some overlap, but neither meaning is a subset of the other. For example, in `add [si+16], ax`, the destination operand is memory at `ds:si+16`, but the machine code (`01 44 10`) contains a literal `10h` byte separate from the `44` ModRM that encodes SI as the base register for the addressing mode. The part of the instruction machine code that isn't an immediate value is not all opcode, some of it encodes non-immediate operands. — Peter Cordes, Nov 28 '22 at 22:09

Opcode vs Operand in x86 assembly source code

1 Answers1

If it was mov word ptr [si], 0000h, the machine code would be

If it was mov ax, 0000h

If it was `mov word ptr [si], 0000h`, the machine code would be

If it was `mov ax, 0000h`