AVX512 vector length and SAE control

Question

My question concerns EVEX-encoded packed reg-reg instructions without rounding semantic which allow SAE control (Suppress All Exceptions), such as VMIN*, VCVTT*, VGETEXT*, VREDUCE*, VRANGE* etc. Intel declares SAE-awareness only with full 512bit vector length, e.g.

VMINPD xmm1 {k1}{z}, xmm2, xmm3
VMINPD ymm1 {k1}{z}, ymm2, ymm3
VMINPD zmm1 {k1}{z}, zmm2, zmm3{sae}

but I don't see a reason why SAE couldn't be applied to instructions where xmm or ymm registers are used.

In chapter 4.6.4 of Intel Instruction Set Extensions Programming Reference Table 4-7 says that in instructions without rounding semantic bit EVEX.b specifies that SAE is applied, and bits EVEX.L'L specify explicit vector length:

00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)
11b: reserved

so their combination should be legal.

However NASM assembles vminpd zmm1,zmm2,zmm3,{sae} as 62F1ED185DCB, i.e. EVEX.L'L=00b, EVEX.b=1, which is disassembled back by NDISASM 2.12 as vminpd xmm1,xmm2,xmm3

NASM refuses to assemble vminpd ymm1,ymm2,ymm3,{sae} and NDISASM disassembles 62F1ED385DCB (EVEX.L'L=01b, EVEX.b=1) as vminpd xmm1,xmm2,xmm3

I wonder how does Knights Landing CPU execute VMINPD ymm1, ymm2, ymm3{sae} (assembled as 62F1ED385DCB, EVEX.L'L=01b, EVEX.b=1):

CPU throws an exception. Intel doc Table 4-7 is misleading.
SAE is in effect, CPU operates with xmm only, same as in scalar operations. NASM and NDISASM do it right, Intel documentation is wrong.
SAE is ignored, CPU operates with 256 bits according to VMINPD specification in Intel doc. NASM & NDISASM are wrong.
SAE is in effect, CPU operates with 256 bits as specified in instruction code. NASM and NDISASM are wrong, Intel doc needs to supplementary decorate xmm/ymm instructions with {sae}.
SAE is in effect, CPU operates with implied full vector size 512 bits, regardless of EVEX.L'L, same as if static roundings {er} were allowed. NDISASM and Intel doc Table 4-7 are wrong.

Note that KNL and other Xeon Phi CPUs don't support AVX-512VL, so they can only use EVEX with scalar or ZMM instructions, not XMM or YMM. For example, `VMINPD ymm21, ymm22, ymm23` is encodeable (requiring EVEX for the high register numbers), but KNL won't run it. — Peter Cordes, Aug 09 '22 at 08:15

Ross Ridge · Answer 1 · 2016-08-15T21:13:27.153

4

Your VMINPD ymm1, ymm2, ymm3{sae} instruction is invalid. According to instruction set reference for MINPD in the Intel Architecture Instruction Set Extensions Programming Reference (February 2016) only the following encodings are allowed:

66 0F 5D /r                  MINPD xmm1, xmm2/m128 
VEX.NDS.128.66.0F.WIG 5D /r  VMINPD xmm1, xmm2, xmm3/m128
VEX.NDS.256.66.0F.WIG 5D /r  VMINPD ymm1, ymm2, ymm3/m256
EVEX.NDS.128.66.0F.W1 5D /r  VMINPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst
EVEX.NDS.256.66.0F.W1 5D /r  VMINPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst
EVEX.NDS.512.66.0F.W1 5D /r  VMINPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae}

Notice that only the last version is shown with a {sae} suffix, meaning it's the only form of the instruction you're allowed to use it with. Just because the bits exists to encode a particular instruction doesn't mean its valid.

Also note that section 4.6.3, SAE Support in EVEX, makes it clear that SAE doesn't apply to 128-bit or 256-bit vectors:

The EVEX encoding system allows arithmetic floating-point instructions without rounding semantic to be encoded with the SAE attribute. This capability applies to scalar and 512-bit vector lengths, register-to-register only, by setting EVEX.b. When EVEX.b is set, “suppress all exceptions” is implied. [...]

I'm not sure however whether your hand crafted instruction would generate Invalid Opcode exception, if the EVEX.b bit will simply be ignored, or if the EVEX.L'L bits will be ignored. EVEX encoded VMINPD instructions belong to the Type E2 exception class, and according to Table 4-17, Type E2 Class Exception Conditions, the instruction can generate an #UD exception in any of the following cases:

State requirement, Table 4-8 not met.

Opcode independent #UD condition in Table 4-9.

Operand encoding #UD conditions in Table 4-10.

Opmask encoding #UD condition of Table 4-11.

If EVEX.L’L != 10b (VL=512).

Only that last reason seems to apply here, but it would mean that your instruction would generate #UD exception with or without the {sae} modifier. Since this seems to directly contradict the allowed encodings in the instruction summary, I'm not sure what would happen.

edited Aug 15 '16 at 21:13

answered Aug 15 '16 at 18:20

Ross Ridge

38,414
7
81
112

Good point that the docs say you can't do it, regardless of encoding details. However, Mysticial's answer points out that EVEX.L'L overlaps EVEX.RC, and EVEX.b selects which one they're interpreted as. – Peter Cordes Aug 15 '16 at 19:20
@PeterCordes Except, as explained in in the question, Table 4-7 contradicts that interpretation. It says that for "FP Instructions w/o rounding semantic, can cause #XF" that EVEX.b selects "SAE Control" while EVEX.L'L determines vector length and EVEX.RC isn't applicable. According to the table it's the instruction type that determines the interpretation of `P2[6:5]`. So for example `VMINPD ymm1, ymm2, [rax]{1to8}` has EVEX.b set while EVEX.L'L is 01b and EVEX.RC is N/A. The OPs problem is that this doesn't work for `{sae}`. The encoding he wants exists, but it's simply not allowed. – Ross Ridge Aug 15 '16 at 19:50
Initially, I strongly disagreed with your answer. But after going through table 4-7 in detail, I've determined that the PDF is either incomplete or contradicts itself. FP instructions have the concept of "rounding semantics". But there's no list in the doc that states which instructions lack that. Table 4-7 states that `P2[6:5]` is always interpreted as `EVEX.L'L` for FP instructions that lack "rounding semantics". – Mysticial Aug 15 '16 at 20:16
If `vminpd` lacks "rounding semantics" (as you'd intuitively think), then the doc for `vminpd` contradicts table 4-7. If `vminpd` has "rounding semantics", then my answer is correct and this answer is incorrect. Based on the availability of intrinsics on the intrinsics reference site. Either the table 4-7 is wrong, or `vminpd` is in the same instruction class as `vaddpd` (i.e. has "rounding semantics" even when it shouldn't) /cc @PeterCordes – Mysticial Aug 15 '16 at 20:16
2

@Mysticial I don't see the contradiction. VMINPD isn't in the "rounding semantics" class because the instruction summary doesn't have `{er}` on any of the instruction versions. Table 4-7 says that this means the P2[6:5] bits encode EVEX.L'L, while the VMINPD summary says that if `{sae}` is used the length must be 512, ie. EVEX.L'L must be 10b. – Ross Ridge Aug 15 '16 at 20:57
Oh... I didn't notice the `{er}` vs. `{sae}` distinction. You're right. – Mysticial Aug 15 '16 at 21:19
2

Thinking about this from the design POV. Table 4-7 seems to hint that Intel may have initially *intended* to allow SAE with vector-length for stuff without rounding-semantecs. But when it came time to implementing the hardware, it might've gotten in the way. As it is right now, the hardware doesn't need a separate instruction class. The `P2[6:5]` bits won't matter since the rounding is a no-op. So it becomes a question of whether there is logic to #UD on the invalid case. I'm curious at what the Intel emulator will do and if it's the same as the actual hardware. – Mysticial Aug 15 '16 at 21:20
1

Shuttling some info from twitter: https://twitter.com/iximeow/status/1406434037988618240?s=19 – Pepijn Jun 20 '21 at 06:12

score 2 · Answer 2 · answered Jun 22 '21 at 09:36

On Twitter, iximeow gives some addenda to Ross Ridge's answer above:

ross ridge is right that the text is invalid, but the important detail is that L'L selects the specific SAE mode, so if you set L'L to indicate ymm, you just get {rd-sae}

this is to say, if you set b for sae at all, the vector width is immediately fixed to 512 bits

vector widths are fixed to 512 bits*

*except for some cvt instructions where one operand is 512 bits and one operand is smaller

(@Pepijn's comment on Ross's answer already linked to those tweets; but I figured it's worth making this a separate answer, if only for visiblity.)

So the option #5. in my question seems to aply. Thanks. – vitsoft Jun 22 '21 at 18:08 — vitsoft, Jun 22 '21 at 18:08

AVX512 vector length and SAE control

2 Answers2

Linked