Why MOVZX r64, r/m8 behave like MOVZX r32, r/m8

Question

Here is the code snippet

int main()
{
     unsigned long int ui64{};
     unsigned char ui8{ 0xAA };
     unsigned short ui16 { 0xBBBB};
     ui64 = ui8;
     ui64 = ui16;
}

Here is the opcodes that will need

MOVZX—Move With Zero-Extend     
                             
        0F B6 /r MOVZX r32, r/m8  RM Valid Valid 
REX.W + 0F B6 /r MOVZX r64, r/m8  RM Valid N.E.

        0F B7 /r MOVZX r32, r/m16 RM Valid Valid 
REX.W + 0F B7 /r MOVZX r64, r/m16 RM Valid N.E.

When i run that code and check the assembly Godbolt MOVZX, it seems like using

0F B6 /r MOVZX r32, r/m8
0F B7 /r MOVZX r32, r/m16

instead of

REX.W + 0F B6 /r MOVZX r64, r/m8
REX.W + 0F B7 /r MOVZX r64, r/m16

I checked the MOVSX to find out that same situation happens in that instruction too.

code snippet

int main()
{
    long int i64{};
    char i8{ 1 };
    short i16 {2};

    i64 = i8;
    i64 = i16;
}

Here is the assembly code Godbolt MOVSX

MOVSX/MOVSXD—Move With Sign-Extension

        0F BE /r MOVSX r32, r/m8 RM Valid Valid
REX.W + 0F BE /r MOVSX r64, r/m8 RM Valid N.E

        0F BF /r MOVSX r32, r/m16 RM Valid Valid
REX.W + 0F BF /r MOVSX r64, r/m16 RM Valid N.E.

This time opcodes

REX.W + 0F BE /r MOVSX r64, r/m8 RM Valid N.E
REX.W + 0F BF /r MOVSX r64, r/m16 RM Valid N.E.

behave as i expect.

I want to know what is the reason ?

Implicit zero-extension from 32 to 64-bit saves a REX prefix. Same reason there is no `movzx r64, r/m32` - [MOVZX missing 32 bit register to 64 bit register](https://stackoverflow.com/q/51387571). There's no reason to ever use REX.W with `movzx` (only `movsx`), although the ISA does allow it, perhaps because there was no reason not to, and it might take more transistors to disallow it and fault. — Peter Cordes, Feb 25 '23 at 11:21
@PeterCordes i need time for reading those answers, what is the meaning of *saving REX prefix?* I understand that the next line `mov QWORD PTR [rbp-0x8],rax` have opcodes `48 89 45 f8` and `48` is come from *saving REX prefix* but what is the reason ? — UPinar, Feb 25 '23 at 11:31
Saving as in avoiding spending that extra code size for an instruction that does the same thing. When all else is equal for performance, smaller code is usually best, so that's what compilers like GCC do (or compiler developers do) when there are multiple choices, like `movzx r64, r/m8` vs. `movzx r32, r/m8`. — Peter Cordes, Feb 25 '23 at 11:39
@PeterCordes oo.. so you mean `48 0f b6 45 f7` && `89 45 f8` is slower than `0f b6 45 f7` && `48 89 45 f8` because one of the first pairs value is 5 — UPinar, Feb 25 '23 at 11:47
No, the total machine-code size is what matters. `48 89 45 f8` and `89 45 f8` are different instructions, which you choose depends on the width you want to store. Unlike for the zero-extending load, where it never makes sense to do `movzx rax, byte ptr [mem]`. Always just load into EAX, whether you're going to store 32 bits (EAX) or 64 bits (RAX). The choice of which store instruction is totally separate. — Peter Cordes, Feb 25 '23 at 11:55
@PeterCordes i thought that `48` goes to next instruction, but next instruction already have REX.W prefix `REX.W + 89 /r MOV r/m64, r64`, now it clears. Thank you for explanations, i will read those answers. — UPinar, Feb 25 '23 at 12:02

Why MOVZX r64, r/m8 behave like MOVZX r32, r/m8

0 Answers0