How does x86 handle byte vs word addressing when executing instructions and reading/writing data?

Question

So I am learning how x86 works and have come across people saying that it is byte-addressable, yet can read words, double words, etc. How does the processor decide which method to use and when? E.g. for accessing the next instruction and when a user wants to read/write to memory, which addressing mode is used?

Peter Cordes · Accepted Answer · 2019-10-30T22:10:02.903

Every memory access has an operand-size specified by the machine-code instruction. (Addressing mode isn't the right term: different addressing modes are different ways of specifying the lowest address of the chunk of memory to be accessed, like [rdi] vs. [rdi + rdx*8] vs. [RIP + rel32])

Encoding different operand-sizes is done with prefixes (for 16 vs. 32 vs. 64-bit for integer instructions) or a different opcode for the same mnemonic (8-bit integer). Or with bits in the VEX or EVEX prefix for AVX / AVX512 instructions that can use xmm, ymm, or zmm registers.

Decoding also depends on the current mode implying the default operand-size: 32 for 32 and 64-bit mode, or 16 for 16-bit mode. A 66 opererand-size prefix implies the opposite size.

In 64-bit mode, the .W (width) bit in the REX prefix sets the operand-size to 64-bit. (And some instructions like push/pop default to 64-bit operand-size with no prefix needed, but most instructions like add/sub/mov still default to 32-bit)

There's also a 0x67 address-size prefix which swaps addressing modes to the other size. (16 vs. 32 or in 64-bit mode 64 -> 32.)

For example, mov [rdi], eax is a dword store, and the machine-code encoding will specify that by using no special prefixes on the opcode for 16/32/64-bit operand-size. (see https://www.felixcloutier.com/x86/mov for the available encodings. But note that Intel's manual doesn't mention 66 operand-size prefixes in each entry: it has 2 identical encodings with different sizes. You have to know which one needs a 66 prefix based on the current mode's default.)

16-bit operand-size like mov [rdi], ax will have the same machine code by with a 66 operand-size prefix.

8-bit operand-size (mov [rdi], al) has its own opcode, no prefixes needed.

movzx / movsx are interesting cases: the memory access size is different from the destination register. The memory-access size (byte or word) is specified by the opcode. Operand-size prefixes only affect the destination size. Except x86-64 63 /r movsxd (dword->qword sign-extension) where a 66 operand-size prefix does shrink the memory-access size down to m16 to match the destination.

Similarly for SIMD instructions; the instruction encoding uniquely determines the memory-access size, along with the registers read or written.

Can we demonstrate by means of a simple example? If I say MOV AL, [0110], can I safely say it is a "word instruction" because the operand, 0110, is two bytes large? — Dean P, Oct 27 '20 at 16:44
@DeanP: For that instruction, the *address-size* is "word", assuming you're in 16-bit mode so a bare `0110` is interpreted as a 16-bit (hex?) number. The *operand-size* is "byte", because `mov` is transferring 1 byte from memory to a byte register, AL. — Peter Cordes, Oct 27 '20 at 21:31

How does x86 handle byte vs word addressing when executing instructions and reading/writing data?

1 Answers1

Linked

Related