How to write an operand that is a 512-bit vector loaded from a N-bit memory location in x86 Assembly

Question

The source Intel manual is here: https://cdrdv2.intel.com/v1/dl/getContent/671110

The registers are specified as m32bcst or m64bcst

Example of an instruction that has a variant that uses this operand

I am interested in writing the variant of the instruction that uses this operand in actual Assembly.

If instead of operand m32bcst we had a variant with operand m32, using MASM Assembly for instance one could write: VMINPS YMM1{k1}{z}, YMM2, DWORD PTR[EAX]

I am not sure what to do in case of an m32bcst operand however.

I'm not sure about MASM syntax for *broadcasting*, but using [€ASM](https://euroassembler.eu/eadoc/#InstructionModifiers) for instance one could write `VMINPS YMM1,YMM2,[EAX],Bcst=on`. See also [more examples](https://euroassembler.eu/eatests/t5216.htm). — vitsoft, Mar 08 '23 at 12:24
Thank you, using the disassembler I found out that `YMM1,YMM2,[EAX],Bcst=on` is equivalent to `VMINPS XMM1,XMM2,DWORD PTR [EAX]{1to4}` in MASM (not sure where the syntax comes from). If you write an answer, I will accept it. — Domen Hočevar, Mar 08 '23 at 13:00
[MASM](https://learn.microsoft.com/en-us/cpp/assembler/masm/instruction-format?view=msvc-170) seems to prefer the curly-bracket decoration syntax proposed in INTEL/AMD manuals, but for *broadcasting* it uses the keyword **BCST**: `VMINPS ZMM1, ZMM2, DWORD bcst [EAX]`. However, I didn't test it. — vitsoft, Mar 08 '23 at 13:39
You are right, I even tried "VMINPS ZMM1, ZMM2, DWORD PTR bcst [EAX]" by accident instead of this right one (disassembler also returned it when using the right settings). Thank you, I will add the solution in the post. — Domen Hočevar, Mar 08 '23 at 13:53
Please don't put solutions in a question. If you have a solution to your own question you can self-answer your question. — Michael Petch, Mar 08 '23 at 13:59
[Don't put images of texts](https://meta.stackoverflow.com/q/285551/995714) — phuclv, Mar 08 '23 at 14:07
@phuclv I did link the whole pdf as well, not sure if I can link parts of the pdf? — Domen Hočevar, Mar 08 '23 at 14:46
@vitsoft: Syntax varies wildly, it seems. NASM uses `vminps zmm1, zmm2, dword [rax] {1to16}` or `vminps zmm1, zmm2, [rax] {1to16}` with the `dword` size implicit; any broadcast instruction only has one choice of scalar size, since the encoding only has 1 bit for broadcast or not. And the `{1to16}` plus the `zmm` destination also implies a size; NASM checks that for consistency, too. That syntax is the same shown in Intel's 2014 slides from a talk introducing AVX512 https://gcc.gnu.org/wiki/cauldron2014?action=AttachFile&do=get&target=Cauldron14_AVX-512_Vector_ISA_Kirill_Yukhin_20140711.pdf — Peter Cordes, Mar 08 '23 at 18:10

Peter Cordes · Accepted Answer · 2023-03-08T18:58:54.790

It varies by assembler. Some support the {1to16} / {1to8} / {1to4} syntax in slides from a 2014 talk introducing AVX-512 at a GCC conference, by Kirill Yukhin of Intel. (Despite it being a GCC talk, the slides use Intel syntax.) Others support that and/or something else.

MASM: vminps zmm1, zmm2, DWORD bcst [rax]
NASM vminps zmm1, zmm2, [rax] {1to16} (optional dword or qword specifier in the usual place, like dword [rax]{1to16} NASM does not support the bcst keyword.
€ASM aka Euro Assembler: vminps ymm1,ymm2,[rax],Bcst=on
GAS/clang .intel_syntax is MASM-like in general and supports dword bcst [rax]. But also [rax]{1to16}. (objdump -drwC -Mintel uses dword bcst [rax])
AT&T syntax: vminps (%rax){1to16},%zmm2,%zmm1

The machine code only has 1 bit to encode broadcast vs. regular, so there's no way to broadcast 64-bit pairs of floats for vminps; the broadcast element size has to match the SIMD element size. So €ASM's minimal syntax is sufficient; the others merely provide a way for the assembler to check for a mismatch in what the human thinks the instruction will do.

Unlike embedded rounding + suppress-all-exceptions which only work with scalar (like vmulss) or 512-bit instructions¹, broadcast memory operands do work with 256 and 128-bit bit vectors as well (AVX512VL).

Broadcast element sizes of 32 and 64-bit are supported; not coincidentally, those are the element sizes that load ports on Intel CPUs can do for free as part of a load uop. (Note that vpbroadcastb/w vec, [mem] need an ALU uop, vpbroadcastd/q only need the load uop.)

Footnote 1: e.g. vmulps zmm0,zmm1,zmm2{rz-sae} (GAS .intel_syntax / MASM)
or vmulps zmm0, zmm1, zmm2, {rz-sae} (NASM, with an extra comma before the {})

score 1 · Answer 2 · edited Mar 08 '23 at 16:54

1

The specified instruction can be written as VMINPS ZMM1, ZMM2, DWORD bcst [EAX]

An example can be seen here

edited Mar 08 '23 at 16:54

vitsoft

5,515
1
18
31

answered Mar 08 '23 at 14:45

Domen Hočevar

23
5

It seems GAS `.intel_syntax` / `objdump -drwC -Mintel` also uses this `bcst` keyword, although in AT&T syntax it disassembles to `vminps (%rax){1to16},%zmm2,%zmm1`. vs. NASM's `vminps zmm1, zmm2, [rax] {1to16}` which MASM might also support. NASM does *not* support the `bcst` keyword. Euro Assembler's `VMINPS YMM1,YMM2,[EAX],Bcst=on` is yet another variety. – Peter Cordes Mar 08 '23 at 18:15

How to write an operand that is a 512-bit vector loaded from a N-bit memory location in x86 Assembly

2 Answers2