Why is x86 MOV two bytes, not one? How does the opcode and machine code work?

Question

I'm having trouble understanding a very basic x86 instruction. The instruction is

0x080491d7 <+1>:     mov    %esp,%ebp

I know that it moves the value of esp into ebp. But I'm trying to understand the opcodes. The instruction is 2 bytes long, not 1 which I'm confused about. I would've thought it would only be 1 byte.

The memory for this instruction is:

0x80491d7 <main+1>:     0x89    0xe5

I know that 0x89 is one of the opcodes for MOV. I've been reading the Intel manuals. I don't know what 0xe5 is for. Is it like a suffix or another opcode value or something else? The Intel manual is a little confusing.

The c program is compiled for x86 32 bit and the Linux server is x86_64.

It's the ModRM byte, which encodes the characteristics of the source and target arguments of the instruction. It's in the documentation that you say you've been looking at. — 500 - Internal Server Error, Nov 22 '22 at 19:21
A little counting might help in thinking about this. x86-32 has eight general-purpose registers (eax, ebx, ecx, edx, esi, edi, ebp, esp), so you would need 3 bits each to specify a source and a destination register. If instructions were just one byte, that would only leave two bits for the opcode, and the machine could only have four different instructions. Not even allowing for memory operands. So necessarily, the vast majority of instructions must be more than one byte. — Nate Eldredge, Nov 22 '22 at 20:31
The *general* principle of x86 instruction encoding is that the opcode is the entire first byte, followed by a ModR/M byte to specify the operands. There can be additional bytes depending on what the operands are (SIB, address displacement, immediate data, etc). Though there are so many exceptions to this rule that they almost swallow it. — Nate Eldredge, Nov 22 '22 at 20:31
Duplicate of [How to read x86 instruction tables from this site](https://stackoverflow.com/q/59622640) which covers MOV opcodes specifically. Also [How to tell the length of an x86 instruction?](https://stackoverflow.com/q/4567903) covers the general format of opcode + optional ModRM + more optional bytes. [How to determine if ModR/M is needed through Opcodes?](https://stackoverflow.com/q/55312459) / [x86\_64 Opcode encoding formats in the intel manual](https://stackoverflow.com/q/57440527) / [How to read the Intel Opcode notation](https://stackoverflow.com/q/15017659) — Peter Cordes, Nov 23 '22 at 05:12

jcmvbkbc · Answer 1 · 2022-11-22T19:59:43.537

The instruction is 2 bytes long, not 1 which I'm confused about.

Yes, looking into the description of the mov instruction in the Intel Developer Manual volume 2 one may see that encoding is 8B /r, which, according to the chapter 3.1.1.1 "Opcode Column in the Instruction Summary Table" has the following meaning: /r — Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand. So the second byte is the ModR/M byte. Its meaning can be found in the Table 2-2 "32-Bit Addressing Forms with the ModR/M Byte".

score 2 · Answer 2 · answered Nov 22 '22 at 22:52

I know that 0x89 is one of the opcodes for MOV. I've been reading the intel manuals. I don't know what 0xe5 is for. Is it like a suffix or another opcode value or something else? The intel manual is a little confusing.

You found that the mov %esp, %ebp instruction got encoded with 2 bytes: 0x89, 0xE5.

Consulting the Intel manuals is the right thing to do, but I would advice to look at your instruction using the proper Intel syntax mov ebp, esp. It might save you from an inadvertent error interpreting the opcode tables.

Looking up 89h in the one-byte opcode table, you see in the table mentioned "Ev, Gv".

The "Using opcode tables" chapter explains what these character combinations mean.

E --- A ModR/M byte follows the opcode and specifies the operand.
v --- Word or doubleword, depending on operand-size attribute.
, --- Litteraly a separating comma.
G --- The reg-field within the ModR/M byte selects a general purpose register.

So that second byte is a ModR/M byte.

Your ModR/M byte is E5h or 11'100'101b in binary notation following the grouping 'mod-reg-r/m'.

Because of the mention "Gv", the reg-field (100b) refers to a (d)word-sized general purpose register. It could be referring to SP, or ESP.
Because the 2 most significant bits (11b) are set in the ModR/M byte, the 3 least significant bits (101b) refer to a register. And because of the mention "Ev", it could be referring to BP, or EBP.

Which registers? For that we look at the opcode 89h or 100010'0'1b in binary notation following the grouping 'TTTTTT-d-w'.

Bit 0 (w) tells us this is a (d)word-sized operation (which accords with the mention "v" above). Since this is 32-bit code and no operand size prefix (0x66) was used, what remains is ESP/EBP.

Bit 1 (d) tells us which of these operands is the source or the destination (which accords with the mention "E,G" above). Since this bit is 0, the reg field (ESP) indicates the source and the r/m field (EBP) indicates the destination. With a set d-bit it would be the other way round, meaning the bytes 0x8B, 0xEC would also be a perfect encoding for your instruction mov %esp, %ebp.

Why is x86 MOV two bytes, not one? How does the opcode and machine code work?

2 Answers2