There shouldn't be a _mm512_permutevar_epi32
intrinsic, never use it.
Use the _mm512_permutexvar_epi32
intrinsic for the vpermd
instruction.
Intel documents it in their intrinsics guide, but that's bad and misleading naming. We don't need two different intrinsics for the same form of the same instruction, especially one that doesn't follow the previous naming convention. In some ways it's a good thing GCC doesn't provide that misnamed intrinsic for vpermd
. Intel's asm manual entry (https://www.felixcloutier.com/x86/vpermd:vpermw) only lists the permutexvar
intrinsics, so that's good.
The intrinsics guide documentation even says:
This intrinsic is identical to _mm512_permutexvar_epi32
, and it is recommended that you use that intrinsic name.
The naming convention has previously been that lane-crossing shuffles like vpermd
(this one) and vpermps
get an x
in their name, but in-lane shuffles like vpermilps
(_mm512_permutevar_ps
with a vector control and _mm512_permute_ps
with an immediate) don't.
There is no __m512i
integer equivalent of vpermilps vec,vec,vec
, only the immediate-control vpshufd vec,vec,imm8
(_mm512_shuffle_epi32
) and the lane-crossing vpermd
, so it's misleading to name an intrinsic following the in-lane-shuffle naming pattern, especially when there is a difference for the _ps
version of the same names. (vpermps
and vpermilps
both exist since AVX2.)
And just in general, different names that mean the same thing add confusion and make things harder to mentally keep track of, especially when there's nothing wrong or unclear about one of them.
I am in favour of Intel's new _mm_bslli_si128
name for pslldq
byte-shifts, with the "b" in the name emphasizing it's not a bit-shift, not a 128-bit version of _mm_slli_epi64
. In that case I think the new name adds clarity. And the 256 and 512-bit versions reflect the in-lane nature with _mm256_bslli_epi128
instead of si128
, which is unusual but maybe a good reminder.
Unlike here where the non-x
name removes clarity. Perhaps someone at Intel made the mistake of adding the non-x
name first? And somehow they didn't catch that before release, since I assume both names were added to the guide at the same time in this case (since it would be weird if GCC only supported the newer one), unlike the bslli
case where that new name came much later.
Or maybe the permutevar
name for vpermd
did make it into an early publication before hardware was released and before GCC added support for either, or GCC devs caught the inconsistency themselves and brought it to Intel's attention, since Intel's docs now do recommend against using it.
This naming weirdness is also somewhat similar to the case of error: '_mm512_loadu_epi64' was not declared in this scope where Intel introduced redundant intrinsics for non-masking loads. The 128 and 256-bit versions do let you use void*
instead of having to _mm_loadu_si128((const __m128i*)&arr[i])
, but with the downside of being very easy to confuse with intrinsics for movd
and movq
narrow loads (_mm_loadu_si32
).