x86 assembly - mov and movzx from dword to qword?

Question

I'm testing the following code:

.intel_syntax noprefix

.data
.Ltest_data:
    .byte 0xEF, 0xBE, 0xAD, 0xDE, 0xBE, 0xBA, 0xED, 0xFE

.text
...
    movzx rax, BYTE PTR[.Ltest_data]  # 0xEF
    movzx rax, WORD PTR[.Ltest_data]  # 0xBEEF
    movsx rax, DWORD PTR[.Ltest_data] # 0xFFFFDEADBEEF (sign-extension)
    mov   rax, QWORD PTR[.Ltest_data] # 0xFEEDBABEDEADBEEF

    # These don't work:
    # mov rax, DWORD PTR[.Ltest_data]
    # movzx rax, DWORD PTR[.Ltest_data]
...

Looking at the Intel manual, it seems like there's no way to use mov/movzx to move a dword to a qword; the only way I see is to use movsx and mask out the top 32 bits. This is surprising, giving the insane amount of instructions available for x86_64.

Is this correct, or am I missing something?

When a 32-bit value is written to a 64-bit register, the upper 32 bits are *always* cleared, regardless of the instruction. So there is no need to use a movzx instruction. Just use a 32-bit mov with a 32-bit register destination. I.e., `mov eax, [mem]`. — prl, Aug 30 '20 at 06:32
For the same reason, don’t use `movzx rax, byte ptr [mem]` or `movzx rax, word ptr [mem]`. Use eax as the destination, because the instruction encoding is shorter. — prl, Aug 30 '20 at 06:35
Interesting. So `mov eax, DWORD PTR [mem]` clears the top 32 bits of `rax`? Where in the Intel manual says so? — Martin, Aug 30 '20 at 06:38
Yes. So does `add eax, ecx`, `lea edi, [rbx+100]`, or any other instruction with a 32-bit register destination. — prl, Aug 30 '20 at 06:40
3.4.1.1: “When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register: • 64-bit operands generate a 64-bit result in the destination general-purpose register. • 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.” — prl, Aug 30 '20 at 07:01
Very interesting, thanks. I also noticed that e.g. `movsx rax, ebx` works in GNU AS (as indicated in the Intel manual), but LLVM throws an error instead (see https://godbolt.org/z/a1fWra). I wonder whether this is an LLVM bug. — Martin, Aug 30 '20 at 07:05
It's not a bug, just LLVM being strict about enforcing the documented different mnemonic for 32->64: `movsxd rax, ebx`, a new opcode for AMD64 rather than a repurposed existing one. See Intel's manual: https://www.felixcloutier.com/x86/movsx:movsxd, and comments on [X86: What does \`movsxd rdx,edx\` instruction mean?](https://stackoverflow.com/posts/comments/99719192) — Peter Cordes, Aug 30 '20 at 09:00

x86 assembly - mov and movzx from dword to qword?

0 Answers0