x86_64: When is `movzbq` preferable over `movzbl`

Question

On my x86_64 machine, I used objdump -d to check the encoding of the following two instructions:

movzbl (%rdi),%eax: encoded in 3 bytes (0f b6 07)
movzbq (%rdi),%rax: encoded in 4 bytes (48 0f b6 07)

Because of implicit zero extension of upper 32 bits for 32-bit operands, movzbl would achieve the same data movement task as movzbq but with 1 less byte of encoding.

When would the compiler prefer to use movzbq over movzbl despite that movzbq takes up an extra byte ?

I don't think there's ever a reason to use it. The reason it exists is that it is simpler for the decoder to not have a special case to prevent it. — prl, Apr 23 '23 at 05:41
When the compiler wants to align the following instruction, it can insert an unnecessary prefix instead of a separate nop instruction. — prl, Apr 23 '23 at 05:43
Basically just [What methods can be used to efficiently extend instruction length on modern x86?](https://stackoverflow.com/q/48046814) - never better than `movzbl` with an unnecessary REX prefix with no bits set, e.g. a `0x40` instead of `0x48` REX.W. But if you (or a compiler) is generating asm source instead of machine-code directly, `movzbq` will get GAS to use an extra prefix byte as @prl says, even with `as -Os` surprisingly. There's no downside to a 4-byte `movzbq` vs. a 4-byte `movzbl` on any CPU I'm aware of, but no upside either. — Peter Cordes, Apr 23 '23 at 06:42

score 5 · Accepted Answer · answered Apr 23 '23 at 06:00

When would the compiler prefer to use movzbq over movzbl despite that movzbq takes up an extra byte ?

Whether movbq takes up an extra byte depends on the registers used. For example, movzbl (%rdi),%r8d is encoded as 44 0f b6 07 (because the "REX prefix" is needed to select r8) and movzbq (%rdi),%r8 is encoded as 4C 0f b6 07.

This gives 2 slightly different cases:

a) It can be 1 byte shorter. In this case there's no valid reason to choose the longer movzbq and compilers that do this (when optimization is enabled) are simply bad at instruction selection.

b) It can't be 1 byte shorter. In this case there's no reason to choose one or the other - it makes no difference at all.

For both of these cases; for "compiler developer's convenience" a compiler's decisions are likely to lean towards symmetry with movsbl and movsbq (where there is an actual difference).

x86_64: When is `movzbq` preferable over `movzbl`

1 Answers1