I have some hand-vectorized C++ code that I'm trying to make a distribute-able binary for via function multiversioning. Since the code uses SIMD intrinsics for different instruction sets (SSE2, AVX2, AVX512), it uses template specializations to decide on which intrinsics to use.
The overall structure is roughly as follows:
template <unsigned W, unsigned N> struct SIMD {}; // SIMD abstraction
template <> struct SIMD<128, 8> { // specialization for specific dimensions
using Vec = __m128i;
static always_inline Vec add(Vec a, Vec b) { return _mm_add_epi8(a, b); }
... // many other SIMD methods
};
... // many other dimension specializations for different instruction sets
template <unsigned W, unsigned N> class Worker {
void doComputation(int x) {
using S = SIMD<W, N>;
... // do computations using S:: methods
}
}
Now the issue is that I need different instantiations of Worker
to have different attributes, since each will target a different instruction set. Something like this:
template __attribute__((target("avx2"))) void Worker<256, 8>::doComputation(int x);
template __attribute__((target("avx512bw"))) void Worker<512, 8>::doComputation(int x);
...
so that these different instantiations get compiled for those different targets. However, this still produces an error on Clang:
error: always_inline function 'add' requires target feature 'avx2', but would be inlined into function 'doComputation' that is compiled without support for 'avx2'
If I annotate the original method with __attribute__((target("avx2,avx512")))
it compiles but executes an illegal hardware instruction at runtime if there is no AVX-512 support, so I guess my intuition of using the annotated specializations as above doesn't work.
Is there a way to express this with Clang or GCC using function attributes?