Why does it take less bytes to use `xor` than to use `mov`?

Question

For setting x to zero (x = 0), my csapp book indicates two ways.

First:

xorq %rcx, %rcx

Second:

movq $0, %rcx

It also tells that the first one takes only 3 bytes, but the second one takes 7 bytes.

How do the two ways work? Why does the first one take fewer bytes than the second one?

Different instructions in x86 are different sizes. xor takes fewer bytes to encode. — Shawn, Sep 25 '19 at 02:37
Check the instruction set reference for the encodings of these instructions. That said, `xor %eax, %eax` is even shorter at only two bytes. — fuz, Sep 25 '19 at 09:30

Peter Cordes · Accepted Answer · 2021-10-18T01:45:17.023

Because mov needs more space to encode its 32-bit immediate source operand.
xor only needs the ModRM byte to encode its operands.

Neither one needs a REX prefix so you should be comparing 2-byte xor %ecx,%ecx against 5-byte mov $0, %ecx. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
GAS doesn't do this optimization for you, and movq gives you the mov $sign_extended_imm32, %r/m64 encoding instead of the special case 5-byte mov $imm32, %r32 encoding that omits the ModRM byte.
(Unless you use as -O2 in which case it will optimize the operand-size like NASM. Note that gcc -O2 -c foo.s does not pass on optimization options to as.)

(As noted in CS:APP example uses idivq with two operands?, CS:APP seems to be full of asm mistakes. This one isn't an invalid-syntax mistake, just a missed optimization.)

There is unfortunately no encoding of mov with a sign-extended 8-bit immediate, otherwise we could have 3-byte mov reg, imm8. (https://www.felixcloutier.com/x86/mov). (I'm surprised no iteration of x86-64 has repurposed one of opcode bytes it freed up for a nice mov encoding like that, maybe lumped in with BMI1 or something.)

For more details on x86 instruction encoding, read Intel's vol.2 manual and look at disassembly, and https://wiki.osdev.org/X86-64_Instruction_Encoding is a nice overview that's less verbose than Intel's manual.

See also What is the best way to set a register to zero in x86 assembly: xor, mov or and? for more details about why xor-zeroing is optimal: on some CPUs, notably P6-family and Sandybridge-family, it has microarchitectural advantages over mov besides simply code-size.

Martin Rosenau · Answer 2 · 2019-09-25T12:57:08.307

Why does the first one take fewer bytes than second?

While Peter Cordes' answer is already about the technical details, I'd like to focus on the mathematical background:

x86s CPU obviously does not distinguish between large numbers (like 12345789) and the value zero: For storing such a value 4 bytes are required.

However, the value zero is a very special value:

It can be written as (a-a) or as (a XOR a) while "a" can be any integer value!

This means that you can perform a trick:

You perform the operation subq %rcx, %rcx to calculate the value (rcx - rcx). It does not care which value rcx has: If you subtract that value from itself, the result will be zero (because (a-a)=0).

This means that rcx will be 0 after that operation.

The operation xorq %rcx, %rcx has the same effect, because (a XOR a) is also always 0.

1-byte immediate are supported for most instructions *other* than `mov`. e.g. `add $4, %ecx` is 3 bytes (opcode + modrm + sign_extended_imm8) while `add $4096, %ecx` is 6 bytes (opcode + modrm + imm32). So your 2nd paragraph about x86 not treating small values specially should maybe limit itself to `mov`. But yes, +1 for explaining `a^a = a-a = 0`. Forgot to put anything about that in my answer. — Peter Cordes, Sep 25 '19 at 12:16

Why does it take less bytes to use `xor` than to use `mov`?

2 Answers2