My question concerns EVEX-encoded packed reg-reg instructions without rounding semantic which allow SAE control (Suppress All Exceptions), such as VMIN*, VCVTT*, VGETEXT*, VREDUCE*, VRANGE* etc. Intel declares SAE-awareness only with full 512bit vector length, e.g.
VMINPD xmm1 {k1}{z}, xmm2, xmm3
VMINPD ymm1 {k1}{z}, ymm2, ymm3
VMINPD zmm1 {k1}{z}, zmm2, zmm3{sae}
but I don't see a reason why SAE couldn't be applied to instructions where xmm or ymm registers are used.
In chapter 4.6.4 of Intel Instruction Set Extensions Programming Reference Table 4-7 says that in instructions without rounding semantic bit EVEX.b specifies that SAE is applied, and bits EVEX.L'L specify explicit vector length:
00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)
11b: reserved
so their combination should be legal.
However NASM assembles vminpd zmm1,zmm2,zmm3,{sae}
as 62F1ED185DCB, i.e. EVEX.L'L=00b, EVEX.b=1, which is disassembled back by NDISASM 2.12 as vminpd xmm1,xmm2,xmm3
NASM refuses to assemble vminpd ymm1,ymm2,ymm3,{sae}
and NDISASM disassembles 62F1ED385DCB (EVEX.L'L=01b, EVEX.b=1) as vminpd xmm1,xmm2,xmm3
I wonder how does Knights Landing CPU execute VMINPD ymm1, ymm2, ymm3{sae}
(assembled as 62F1ED385DCB, EVEX.L'L=01b, EVEX.b=1):
- CPU throws an exception. Intel doc Table 4-7 is misleading.
- SAE is in effect, CPU operates with xmm only, same as in scalar operations. NASM and NDISASM do it right, Intel documentation is wrong.
- SAE is ignored, CPU operates with 256 bits according to VMINPD specification in Intel doc. NASM & NDISASM are wrong.
- SAE is in effect, CPU operates with 256 bits as specified in instruction code. NASM and NDISASM are wrong, Intel doc needs to supplementary decorate xmm/ymm instructions with {sae}.
- SAE is in effect, CPU operates with implied full vector size 512 bits, regardless of EVEX.L'L, same as if static roundings {er} were allowed. NDISASM and Intel doc Table 4-7 are wrong.