2

For setting x to zero (x = 0), my csapp book indicates two ways.

First:

xorq %rcx, %rcx

Second:

movq $0, %rcx

It also tells that the first one takes only 3 bytes, but the second one takes 7 bytes.

How do the two ways work? Why does the first one take fewer bytes than the second one?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Jinwoo Park
  • 99
  • 1
  • 10
  • 2
    Different instructions in x86 are different sizes. xor takes fewer bytes to encode. – Shawn Sep 25 '19 at 02:37
  • Check the instruction set reference for the encodings of these instructions. That said, `xor %eax, %eax` is even shorter at only two bytes. – fuz Sep 25 '19 at 09:30

2 Answers2

8

Because mov needs more space to encode its 32-bit immediate source operand.
xor only needs the ModRM byte to encode its operands.

Neither one needs a REX prefix so you should be comparing 2-byte xor %ecx,%ecx against 5-byte mov $0, %ecx. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
GAS doesn't do this optimization for you, and movq gives you the mov $sign_extended_imm32, %r/m64 encoding instead of the special case 5-byte mov $imm32, %r32 encoding that omits the ModRM byte.
(Unless you use as -O2 in which case it will optimize the operand-size like NASM. Note that gcc -O2 -c foo.s does not pass on optimization options to as.)

(As noted in CS:APP example uses idivq with two operands?, CS:APP seems to be full of asm mistakes. This one isn't an invalid-syntax mistake, just a missed optimization.)


There is unfortunately no encoding of mov with a sign-extended 8-bit immediate, otherwise we could have 3-byte mov reg, imm8. (https://www.felixcloutier.com/x86/mov). (I'm surprised no iteration of x86-64 has repurposed one of opcode bytes it freed up for a nice mov encoding like that, maybe lumped in with BMI1 or something.)

For more details on x86 instruction encoding, read Intel's vol.2 manual and look at disassembly, and https://wiki.osdev.org/X86-64_Instruction_Encoding is a nice overview that's less verbose than Intel's manual.

See also What is the best way to set a register to zero in x86 assembly: xor, mov or and? for more details about why xor-zeroing is optimal: on some CPUs, notably P6-family and Sandybridge-family, it has microarchitectural advantages over mov besides simply code-size.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
4

Why does the first one take fewer bytes than second?

While Peter Cordes' answer is already about the technical details, I'd like to focus on the mathematical background:

x86s CPU obviously does not distinguish between large numbers (like 12345789) and the value zero: For storing such a value 4 bytes are required.

However, the value zero is a very special value:

It can be written as (a-a) or as (a XOR a) while "a" can be any integer value!

This means that you can perform a trick:

You perform the operation subq %rcx, %rcx to calculate the value (rcx - rcx). It does not care which value rcx has: If you subtract that value from itself, the result will be zero (because (a-a)=0).

This means that rcx will be 0 after that operation.

The operation xorq %rcx, %rcx has the same effect, because (a XOR a) is also always 0.

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
  • 1
    1-byte immediate are supported for most instructions *other* than `mov`. e.g. `add $4, %ecx` is 3 bytes (opcode + modrm + sign_extended_imm8) while `add $4096, %ecx` is 6 bytes (opcode + modrm + imm32). So your 2nd paragraph about x86 not treating small values specially should maybe limit itself to `mov`. But yes, +1 for explaining `a^a = a-a = 0`. Forgot to put anything about that in my answer. – Peter Cordes Sep 25 '19 at 12:16