YASM: Instruction movsx refuses dword for operand size?

Question

An assembly program I'm writing will not compile with the y assembler, citing:

error: invalid size for operand 2

On the following line:

movsx rbx, dword [rsi+4*rcx]    ; Copy double-word and sign extend.

However, I can't find any reason why dword should not work. I want to move a double-word (4 bytes) at address rsi+4*rcx into 8 byte register rbx. So there remain 32 bits to be "padded" after copying it into what is effectively ebx. If I change the size to byte, I get no error. But this is not what I want.

There is a question with a similar title here. However, the poster had forgotten to include any size operands whatsoever, and the answer to the question did not resolve my problem.

Edit: I've added the full program below in case the particular syntax I've copied here is not the culprit.

    segment .data

a:
    dd  1
    dd  3
    dd  0
    dd  1
    dd  7
    dd  9
    dd  5
    dd  2
b:
    dd  8
    dd  3
    dd  3
    dd  9
    dd  6
    dd  4
    dd  1
    dd  1

p   dq  0

    segment .text
    global main

main:
    xor rax, rax                    ; Set sum to 0.
    xor rcx, rcx                    ; Set counter to 0.
    lea rsi, [a]                    ; Set source 1.
    lea rdi, [b]                    ; Set source 2.

dot:
    movsx rbx, dword [rsi+4*rcx]    ; Copy in double-word.
    movsx rdx, dword [rdi+4*rcx]    ; Copy in other double-word.        
    imul rbx, rdx                   ; Multiply the two double-words.
    add rax, rbx                    ; Sum product so far.
    inc rcx
    cmp rcx, 8
    jz done
    jmp dot

done:
    mov [p], rax

    xor rax, rax
    ret

Your program uses `movzx`, not `movsx`. Typo? There is no `movzx r64, r/m32` instruction because you just write `mov r32, r/m32` and let the built-in zero-extension do the work. — Raymond Chen, Jan 05 '19 at 15:34
@RaymondChen Yes I corrected it. I was experimenting when I copied it in. The error still holds for condition stated in the title. — Micrified, Jan 05 '19 at 15:36
I have closed your question as a duplicate as the duplicate has a better answer. — fuz, Jan 31 '19 at 23:26
@fuz The question you marked it as a duplicate of was one I visited and mentioned inside my post. But the answer did not solve my problem. — Micrified, Jan 31 '19 at 23:27
@Micrified The answer does specifically say: “YASM requires you to write dword, even though it doesn't accept byte, word, or qword there, and it doesn't accept movsx rcx, dword [c] either (i.e. it requires the movsxd mnemonic for 32-bit source operands).” This is the exact same advice I gave in my answer, too. — fuz, Jan 31 '19 at 23:47

fuz · Accepted Answer · 2019-01-05T16:07:46.900

4

The assembler calls the desired instruction movsxd for some reason:

movsxd rbx, dword [rsi+4*rcx]

This should work.

edited Jan 05 '19 at 16:07

answered Jan 05 '19 at 15:31

fuz

88,405
25
200
352

Sadly I am still given: `invalid size for operand 2` as feedback when this suggestion was implemented. – Micrified Jan 05 '19 at 15:38
this is weird. Have you also tried `movsxd rbx, [dword rsi+4*rcx]` for good measure? – fuz Jan 05 '19 at 15:50
The second suggestion did not work either. Here is a quick [demonstration](https://youtu.be/LflSc6wY0nw) of what happens when I compile, so that it is clear I am not making a mistake elsewhere. – Micrified Jan 05 '19 at 15:53
@Micrified Oh, the `dword` keyword goes before the brackets, sorry. Can you try `movsxd rbx, dword [rsi+4*rcx]` please? And I always appreciate ACME users. – fuz Jan 05 '19 at 16:01
The program compiles now, and works correctly as well. If you update your answer I will gladly accept it! I'm glad you like Acme. I find it does a lot of things right and works well with my small programs. :> – Micrified Jan 05 '19 at 16:05
"for some reason" is because that's what AMD named it. That's a bit of a mystery, instead of just adding a 3rd opcode for `movsx`, but you make it sound like it was YASM's weird decision to name it this way. YASM simply omitted NASM's `movsx` alias for [`movsxd`](https://www.felixcloutier.com/x86/movsx:movsxd). Fun fact: the opcode for `movsxd` is only useful with a REX.W prefix. – Peter Cordes Jan 05 '19 at 19:33
@Micrified: the code in your youtube(!) link (integer dot-product widening from 32 to 64-bit) looks like a use-case for SSE2 `pmuldq`/`paddq` so you can do a pair of integers in parallel. But to feed it you'll need to shuffle your inputs, e.g. load with SSE4.1 `pmovzxdq xmm0, qword [rsi + 4*rcx]`, or do 128-bit loads and `punpckldq` / `hdq` against a zeroed register. – Peter Cordes Jan 05 '19 at 19:39
@PeterCordes I think those instructions are a bit beyond my capabilities at the moment. I am generally just completing suggested exercises in a textbook to get familiar and have only cursory assembly knowledge. However, I'm always interested in knowing of faster ways to do things! If you post this as an answer with a bit more context, I'll give it a vote (I already accepted an answer by now). Thank you for your input nonetheless! – Micrified Jan 05 '19 at 21:12
1

@Micrified: Once you learn anything about SIMD, you should be able to understand what I'm talking about. See also https://agner.org/optimize/, his optimization guide has a chapter on SIMD. Hopefully compilers can auto-vectorize the way I suggested from a C loop, too, so you could get an example that way. – Peter Cordes Jan 05 '19 at 21:17

YASM: Instruction movsx refuses dword for operand size?

1 Answers1