You don't need to specify an operand size for the memory operand,
just use movdqu xmm0, [rsi]
and let xmm0 imply 128-bit operand-size.
NASM supports SSE/AVX/AVX-512 instructions.
If you did want to specify an operand-size, the name for 128-bit is oword
, according to ndisasm
if you assemble that instruction and then disassemble the resulting machine code. oword = oct-word = 8x 2-byte words = 16 bytes.
Note that GNU .intel_syntax noprefix
(as used by objdump -drwC -Mintel
) will use xmmword ptr
, unlike NASM.
If you really want to use xmmword, %define xmmword oword
at the top of your file.
The operand-size is always implied by the mnemonic and / or other register operands for all SSE/AVX/AVX-512 instructions; I can't think of any instructions where you need to specify qword
vs. oword
vs. yword
or anything, the way you do with movsx eax, byte [rdi]
vs. word [rdi]
. Often it's the same size as the register, but there are exceptions with some shuffle / insert / extract instructions. For example:
- SSE2
pinsrw xmm0, [rdi], 3
loads a word
and merges it into bytes 6 and 7 of xmm0.
- SSE2
movq [rdi], xmm0
stores the qword low half
- SSE1
movhps [rdi], xmm0
stores the high qword
- AVX1
vextractf128 [rdi], ymm0, 1
does a 128-bit store of the high half
- AVX2
vpmovzxbw ymm0, [rdi]
does packed byte->word zero extension from a 128-bit memory source operand
- AVX-512F
vpmovdb [rdi]{k1}, zmm2
narrows dword to byte elements (with truncation; other versions do saturation) and does a 128-bit store, with masking at byte granularity. (One of the only ways to do byte-granularity masking without AVX-512BW, other than legacy-SSE maskmovdqu
which has cache-evicting NT semantics. So I guess that makes it especially interesting for Xeon Phi KNL.)
You could specify oword
on any of those to make sure the size of the memory access is what you think it is. (i.e. to have NASM check it for you.)