Mixing SSE2 and AVX intrinsics with different compilers

Question

Is it possible to mix VEX and non-VEX encoded SIMD intrinsics in the same compilation unit? I want to do it to simplify code release to different compilers as single file modules.

Are you separating the code with `#ifdef`s? If so, probably yes. However, having all the code in one source would trigger a recompile for all versions, even if you just modify one alternative. — Bo Persson, Mar 12 '18 at 01:27
I for one would find it more confusing and less simple to have one file relying on obscure preprocessor magic than simply having a bunch of files next to each other and letting the build system worry about picking the right one for the *build* tools it is using. — spectras, Mar 12 '18 at 01:39
"VEX encoded SIMD intrinsics": there are no such things. The intrinsics are high-level constructs, which may get translated to asm using VEX or not (the same intrinsic can be in either case depending on circumstances). — Marc Glisse, Mar 12 '18 at 05:04
@spectras I don't use a build system for my projects nor want to impose one to everybody that uses my code. — user3368561, Mar 12 '18 at 11:08

Peter Cordes · Accepted Answer · 2018-03-12T04:42:05.873

You don't need to do this, and it's often better to just build whole files with -march=haswell vs. -march=core2 or something, so you can set tuning options as well as a target instruction set.

But separate compilation units makes it harder to let small functions inline, so maybe there is a use-case here if you're careful not to actually cause SSE-AVX transition penalties from mixing VEX/non-VEX without vzeroupper, or put VEX-coded instructions into code paths that run on CPUs without AVX support.

IDK how well compilers respect target attributes when inlining, but link-time optimization can inline code from compilation units compiled with different options, too, and AFAIK that doesn't cause problems.

With GNU C function attributes, yes. This works with gcc and clang, but not ICC apparently, even though it doesn't reject the attribute syntax.

Obviously it doesn't work with MSVC, which has different command line options anyway. With MSVC, you can compile a file that uses AVX intrinsics without /arch:AVX, but DON'T do that; it will use VEX encoding only for the instructions that aren't encodeable at all with legacy SSE, like _mm_permutevar_ps (vpermilps), leading to transition penalties.

The GNU C way:

#include <immintrin.h>

__m128 addps_sse(__m128 x, __m128 y) {
    return x+y;       // GNU C alternative to _mm_add_ps.
}

__attribute((target("avx")))    // <<<<<<<<<<< This line
__m128 addps_avx(__m128 x, __m128 y) {
    return x+y;
}

Compiled (on the Godbolt compiler explorer) with gcc and clang -O3 -march=nehalem which makes SSE4.2 available (and tunes for Nehalem), but doesn't enable AVX.

addps_sse:
        addps   xmm0, xmm1
        ret
addps_avx:
        vaddps  xmm0, xmm0, xmm1
        ret

Both gcc and clang emit identical asm, of course. ICC uses addps (non-VEX) for both versions. I didn't check if ICC allowed _mm256 intrinsics inside the function with AVX enabled, but gcc should.

So the answer is "it isn't possible on all compilers". Dammit! — user3368561, Mar 12 '18 at 11:10
@user3368561: This answer only works for gcc/clang. IDK how to do it on ICC or MSVC, but I don't use those compilers (except casually on Godbolt to check out their code-gen), so maybe they have a way. I'm definitely *not* saying that it's impossible on other compilers. — Peter Cordes, Mar 12 '18 at 11:20
I made my own research based on your answer and the only way to change emitted instructions with MSVC is via command line flags, so it is impossible to do it in a single compilation unit as my question requires. — user3368561, Mar 13 '18 at 17:38

Mixing SSE2 and AVX intrinsics with different compilers

1 Answers1