GCC throwing an error while clang works fine while using _mm512_permutevar_epi32

Question

I am getting this error from the GCC compiler -

error: there are no arguments to ‘_mm512_permutevar_epi32’ that depend on a template parameter, so a declaration of ‘_mm512_permutevar_epi32’ must be available [-fpermissive]

rev = _mm512_permutevar_epi32(_mm512_setr_epi32(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0), elem);

The code compiles fine with clang. I have included the immintrin.h and x86intrin.h.

Peter Cordes · Answer 1 · 2023-08-19T19:52:59.120

There shouldn't be a _mm512_permutevar_epi32 intrinsic, never use it.

Use the _mm512_permutexvar_epi32 intrinsic for the vpermd instruction.

Intel documents it in their intrinsics guide, but that's bad and misleading naming. We don't need two different intrinsics for the same form of the same instruction, especially one that doesn't follow the previous naming convention. In some ways it's a good thing GCC doesn't provide that misnamed intrinsic for vpermd. Intel's asm manual entry (https://www.felixcloutier.com/x86/vpermd:vpermw) only lists the permutexvar intrinsics, so that's good.

The intrinsics guide documentation even says:

This intrinsic is identical to _mm512_permutexvar_epi32, and it is recommended that you use that intrinsic name.

The naming convention has previously been that lane-crossing shuffles like vpermd (this one) and vpermps get an x in their name, but in-lane shuffles like vpermilps (_mm512_permutevar_ps with a vector control and _mm512_permute_ps with an immediate) don't.

There is no __m512i integer equivalent of vpermilps vec,vec,vec, only the immediate-control vpshufd vec,vec,imm8 (_mm512_shuffle_epi32) and the lane-crossing vpermd, so it's misleading to name an intrinsic following the in-lane-shuffle naming pattern, especially when there is a difference for the _ps version of the same names. (vpermps and vpermilps both exist since AVX2.)

And just in general, different names that mean the same thing add confusion and make things harder to mentally keep track of, especially when there's nothing wrong or unclear about one of them.

I am in favour of Intel's new _mm_bslli_si128 name for pslldq byte-shifts, with the "b" in the name emphasizing it's not a bit-shift, not a 128-bit version of _mm_slli_epi64. In that case I think the new name adds clarity. And the 256 and 512-bit versions reflect the in-lane nature with _mm256_bslli_epi128 instead of si128, which is unusual but maybe a good reminder.

Unlike here where the non-x name removes clarity. Perhaps someone at Intel made the mistake of adding the non-x name first? And somehow they didn't catch that before release, since I assume both names were added to the guide at the same time in this case (since it would be weird if GCC only supported the newer one), unlike the bslli case where that new name came much later.

Or maybe the permutevar name for vpermd did make it into an early publication before hardware was released and before GCC added support for either, or GCC devs caught the inconsistency themselves and brought it to Intel's attention, since Intel's docs now do recommend against using it.

This naming weirdness is also somewhat similar to the case of error: '_mm512_loadu_epi64' was not declared in this scope where Intel introduced redundant intrinsics for non-masking loads. The 128 and 256-bit versions do let you use void* instead of having to _mm_loadu_si128((const __m128i*)&arr[i]), but with the downside of being very easy to confuse with intrinsics for movd and movq narrow loads (_mm_loadu_si32).

The misnaming might be connected with `x` removal in [`*_set_epi64`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=set_epi64) . `__mm512_set_epi64` doesn't have the annoying `x` anymore. The error then acknowledged: _This intrinsic is identical to `_mm512_permutexvar_epi32`, and it is recommended that you use that intrinsic name_, says the Guide — Alex Guteniev, Aug 19 '23 at 19:49
@AlexGuteniev: Thanks for pointing out that the intrinsics guide entry itself recommends the other name, I hadn't noticed that. But this `x` is in a different place from the `epi64x`, which has/had a different meaning (to disambiguate MMX vs. x86-64 SSE2: *[Meaning of suffix "x" in intrinsics like "\_mm256\_set1\_epi64x"](https://stackoverflow.com/q/44989391)* - `_mm_set_epi64` takes two `__m64` args, not `int64_t`. They followed the naming pattern for `_mm256_set_epi64x` even though there was no `_mm256_set_epi64`. So yes it's "annoying", but totally different meaning than `permutex`) — Peter Cordes, Aug 19 '23 at 19:57

GCC throwing an error while clang works fine while using _mm512_permutevar_epi32

1 Answers1