I have a generically built binary that needs to include a lookup routine which gets compiled into vectorized instructions or otherwise based upon whether the cpu supports avx/avx2.
The lookup routine is same as that explained here : Check all bytes of a __m128i for a match of a single byte using SSE/AVX/AVX2
Here the (_mm_set1_epi8, __mm_cmpeq_epi8,_mm_movemask_epi8) intrinsic set will compile into either vectorized instructions if avx/avx2 is supported by the cpu or just sse based instructions, otherwise.
in a oversimplified main.c : compiled without mavx/mavx2 and with -msse3 -msse4 -o 3
#define __SSE2__
#define SSE_Lookup() \ /*psuedo code*/
_mm_set1_epi8; \
__mm_cmpeq_epi8; \
match_bitmap=_mm_movemask_epi8
#endif
static inline __attribute__((always_inline))
uint64_t foo()
{
unsigned int a=1,b,c,d;
uint64_t match_bitmap;
__cpuid(1,a,b,c,d);
if(c & bit_AVX)
{
match_bitmap= avx_lookup();
}else
{
#if __SSE__
SSE_Lookup();
#endif
}
}
foo_avx.c
#include <emmintrin.h>
//mimicing an intrinsic wrapper
//don't want to create any new stack frames so keeping it inline
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
__avx_lookup (char kk, __m128i h)
{
__m128i k = _mm_set1_epi8(kk);
__m128i r = _mm_cmpeq_epi8(k,h);
return _mm_movemask_epi8(r);
}
compiled with x86_64_gcc-7.5.0_glibc/bin/x86_64-openwrt-linux-gnu-gcc
enwrt-linux-gnu/lib/Scrt1.o: in function _start': (.text+0x20): undefined reference to
main'
collect2: error: ld returned 1 exit status
Makefile:72: recipe for target '/build/x86_64/common/foo_avx.o' failed make[3]: *** [/build/x86_64/common/foo_avx.o] Error 1
So questions are :
- Is the approach correct in defining the intrinsic wrapper that can be compiled with platform specific gcc options
- Is there a better way of doing this ? goal is to have an executable with code for sse , avx as well as avx2 avx512 embedded that can be invoked based upon the cpu support at the run time.
Thanks in advance.
-J
Update: I also tried to add the __avx_lookup
signature in a header file for other source files to see it. but that doesn't seem to be work.