Including the correct intrinsic header

Question

I keep reading opinions on which header file is better to include to access Intel's intrinsics : x86intrin.h or immintrin.h .

Both seem to achieve an identical outcome, but I'm sure there must some subtle differences, with regards to code portability. Maybe one is more common, or more complete, than the other ?

I couldn't find an explanation on any of them. If anyone knows why there are 2 files, and what differences they have, this would be a welcomed SO answer.

Speaking of portability, for older compilers (like gcc < v4.4.0), of course things become more complex, and neither is available. One has to consider including another intrinsic header (likely emmintrin.h for SSE support).

I just include the specific ones for the intrinsics I'm using. Intel's reference guide includes (hah!) them in the notes. — Shawn, May 08 '19 at 21:14

Peter Cordes · Accepted Answer · 2022-11-18T08:16:30.547

(posting an answer here because Header files for x86 SIMD intrinsics has out of date answers that suggest including individual header files).

immintrin.h is portable across all compilers, and includes all Intel SIMD intrinsics, and some scalar extensions like _pdep_u32 that are available with -mbmi2 or a -march= that includes it. (For AMD SSE4a and XOP (Bulldozer-family only, dropped for Zen), you need to include a different header as well.)

The only reason I can think of for including <emmintrin.h> specifically would be if you're using MSVC and want to leave intrinsics undefined for ISA extensions you don't want to depend on.

GCC's model of requiring you to enable extensions before you can use intrinsics for them means the compiler does this checking for you, so you can just #include <immintrin.h> but still get an error if you try to use _mm_shuffle_epi8 (pshufb) without -mssse3.

Don't use compilers older than gcc4.4. They're obsolete and will typically generate slower code, especially for modern CPUs that didn't exist when their tuning settings were being decided.

gcc/clang's x86intrin.h vs. MSVC intrin.h are only useful if you need some extra non-SIMD intrinsics like MSVC's _BitScanReverse() that aren't always portable across compilers. Stuff like integer rotate / bit-scan intrinsics that are baseline (unlike BMI1 lzcnt/tzcnt or BMI2 rorx) but hard or impossible to express in C in a way that compilers will recognize and turn a loop back into a single instruction.

Intel documents some of those as being available in immintrin.h in their intrinsics guide, but gcc/clang and MSVC actually have them in their x86intrin.h or intrin.h headers, respectively.

See How to get the CPU cycle count in x86_64 from C++? for an example of using #ifdef _MSC_VER to choose the right header to define uint64_t __rdtsc(void) and __rdtscp().

Thanks @Peter, that's excellent stuff, accurate and complete. — Cyan, May 10 '19 at 00:26

Including the correct intrinsic header

1 Answers1

Linked

Related