I am using the AVX intrinsic _mm256_extract_epi32().
I am not entirely sure if I am using it correctly, though, because gcc doesn't like my code, whereas clang compiles it and runs it without issue.
I am extracting the lane based on the value of an integer variable, as opposed to using a constant.
When compiling the following snippet with clang3.8 (or clang4) for avx2, it generates code and uses the vpermd instruction.
#include <stdlib.h>
#include <immintrin.h>
#include <stdint.h>
uint32_t foo( int a, __m256i vec )
{
uint32_t e = _mm256_extract_epi32( vec, a );
return e*e;
}
Now, if I use gcc instead, let's say gcc 7.2 then the compiler fails to generate code, with the errors:
In file included from /opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/include/immintrin.h:41:0,
from <source>:2:
/opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/include/avxintrin.h: In function 'foo':
/opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/include/avxintrin.h:524:20: error: the last argument must be a 1-bit immediate
return (__m128i) __builtin_ia32_vextractf128_si256 ((__v8si)__X, __N);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/include/immintrin.h:37:0,
from <source>:2:
/opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/include/smmintrin.h:449:11: error: selector must be an integer constant in the range 0..3
return __builtin_ia32_vec_ext_v4si ((__v4si)__X, __N);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I have two issues with this:
- Why is clang fine with using a variable, and does gcc want a constant?
- Why can't gcc make up its mind? First it demands a 1-bit immediate value, and later it wants an integer constant in the range 0..3 and those are different things.
Intels Intrinsics Guide doesn't specify constraints on the index value for _mm256_extract_epi32() by the way, so who's right here, gcc or clang?