Which header files provide the intrinsics for the different x86 SIMD instruction set extensions (MMX, SSE, AVX, ...)? It seems impossible to find such a list online. Correct me if I'm wrong.
5 Answers
These days you should normally just include <immintrin.h>
. It includes everything.
GCC and clang will stop you from using intrinsics for instructions you haven't enabled at compile time (e.g. with -march=native
or -mavx2 -mbmi2 -mpopcnt -mfma -mcx16 -mtune=znver1
or whatever.)
MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics.
Historically (before immintrin.h
pulled in everything) you had to manually include a header for the highest level of intrinsics you wanted.
This may still be useful with MSVC and ICC to stop yourself from using instruction-sets you don't want to require.
<mmintrin.h> MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<tmmintrin.h> SSSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<ammintrin.h> SSE4A
<wmmintrin.h> AES
<immintrin.h> AVX, AVX2, FMA
Including one of these pulls in all previous ones (except AMD-only SSE4A: immintrin.h
doesn't pull that in)
Some compilers also have <zmmintrin.h>
for AVX512.

- 328,167
- 45
- 605
- 847

- 256,549
- 94
- 388
- 662
-
1I think `ammintrin.h` also has the XOP instructions. – Mysticial Jun 27 '12 at 14:48
-
72Or you can just `#include
` which pulls in everything you need. – Paul R Jun 27 '12 at 15:19 -
2zmmintrin.h has the AVX-512 intrinsics. – onitake Oct 29 '14 at 16:01
-
6Why are p, t, s and n for SSE3/SSSE3/SSE4.1 and 4.2? What do those characters represent? – phuclv Jan 16 '15 at 17:48
-
@LưuVĩnhPhúc I don't have the slightest clue, sorry. – fredoverflow Jan 16 '15 at 18:19
-
8@LưuVĩnhPhúc SSE3 = Prescott new instructions, SSSE3 = Tejas new instructions. I think SSE4.2 and AES refer to the processor family they were introduced on (Nehalem and Westmere) – Drew McGowen Jun 07 '15 at 23:11
-
18Don't include `
` directly; gcc doesn't even provide it. **Just use ` – Peter Cordes Jun 28 '16 at 22:31`** or the even-more-complete ` `. This answer is basically obsolete, unless you're intentionally avoiding including intrinsics for newer versions of SSE because your compiler doesn't complain when you use an SSE4.1 instruction while compiling for SSE2. (gcc/clang *do* complain, so you should just use immintrin.h for them. IDK about others.) -
Does MSVC have something equivalent of `
`? – Royi Sep 20 '18 at 20:38 -
C++Builder v10.x has only up to `
` (SSE2), although in v10.3 the intrinsics headers are old and [unusable due to making use of retired Clang builtins](https://quality.embarcadero.com/browse/RSP-22883). – Tanz87 Dec 07 '18 at 00:15 -
Wow, prefixes of m, x, e, p, t, s, n, a, w, i. Are these random, or is there a method... – SO_fix_the_vote_sorting_bug May 18 '21 at 15:21
-
"MSVC and ICC will let you use intrinsics without enabling anything at compile time, but you still should enable AVX before using AVX intrinsics" - I want to try to support pre-haswell CPUs, so detect AVX2 at runtime, and pick the implementation based on that check. What you said isn't compatible with that, I think. Am I thinking about this wrong, or is the "should" here making assumptions that don't match my use case (e.g. assuming that my code will only running the code on AVX2+ processors)? – Merlyn Morgan-Graham Dec 02 '21 at 00:02
-
1@MerlynMorgan-Graham: Indeed, code that needs to run on non-AVX CPUs should not be built with AVX enabled. Modern MSVC might make ok asm with AVX intrinsics in some functions in a file built without `/arch:AVX`. Especially if you use only 256-bit intrinsics, not mixing in `_mm_add_epi32` sometimes; if you do, check the asm and/or profile to check that you avoid SSE/AVX transition stalls. (There should be a HW event counter for that.) – Peter Cordes May 27 '22 at 01:53
On GCC/clang, if you use just
#include <x86intrin.h>
it will include all SSE/AVX headers which are enabled according to compiler switches like -march=haswell
or just -march=native
. Additionally some x86 specific instructions like bswap
or ror
become available as intrinsics.
The MSVC equivalent of this header <intrin.h>
If you just want portable SIMD, use #include <immintrin.h>
MSVC, ICC, and gcc/clang (and other compilers like Sun I think) all support this header for the SIMD intrinsics documented by Intel's only intrinsics finder / search tool: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

- 328,167
- 45
- 605
- 847

- 29,760
- 6
- 71
- 103
-
I wasn't sure, if the newer versions might... Anyway as long as gcc, icc and clang have it, it ok to use I think :-) – Gunther Piez Jun 27 '12 at 16:04
-
6MSVC doesn't have `
`, but ` – Cody Gray - on strike Jun 01 '14 at 13:21` achieves a similar effect. You still need conditional compilation, of course. :-( -
1All the major x86 compilers have **`#include
`**. Use that for SIMD intrinsics. You only need the even-larger (and slightly slower to compiler) `x86intrin.h` or `intrin.h` if you need stuff like integer rotate / bit-scan intrinsics (although Intel documents some of those as being available in `immintrin.h` [in their intrinsics guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=bsf)). – Peter Cordes Apr 15 '18 at 22:13 -
IIRC, there are some non-SIMD intrinsics which Intel documents as being in immintrin.h, but which gcc, clang, and/or MSVC only have in `x86intrin.h` / `intrin.h` but *not* in `immintrin.h`. – Peter Cordes Apr 15 '18 at 22:15
The header name depends on your compiler and target architecture.
- For Microsoft C++ (targeting x86, x86-64 or ARM) and Intel C/C++ Compiler for Windows use
intrin.h
- For gcc/clang/icc targeting x86/x86-64 use
x86intrin.h
- For gcc/clang/armcc targeting ARM with NEON use
arm_neon.h
- For gcc/clang/armcc targeting ARM with WMMX use
mmintrin.h
- For gcc/clang/xlcc targeting PowerPC with VMX (aka Altivec) and/or VSX use
altivec.h
- For gcc/clang targeting PowerPC with SPE use
spe.h
You can handle all these cases with conditional preprocessing directives:
#if defined(_MSC_VER)
/* Microsoft C/C++-compatible compiler */
#include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
/* GCC-compatible compiler, targeting x86/x86-64 */
#include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
/* GCC-compatible compiler, targeting ARM with NEON */
#include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
/* GCC-compatible compiler, targeting ARM with WMMX */
#include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
/* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
#include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
/* GCC-compatible compiler, targeting PowerPC with SPE */
#include <spe.h>
#endif

- 11,993
- 4
- 27
- 41
-
Here's some more to add to your list: On UltraSPARC+VIS with gcc, use visintrin.h; if you have Sun's VSDK, vis.h offers a different set of intrinsics. Documention can be found here: [GCC VIS builtins](https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/SPARC-VIS-Built-in-Functions.html#SPARC-VIS-Built-in-Functions), [Sun VIS user's guide](http://web.archive.org/web/20040221220512/http://www.sun.com/processors/vis/download/vsdk/visuserg.pdf). – onitake Oct 29 '14 at 15:37
From this page
+----------------+------------------------------------------------------------------------------------------+
| Header | Purpose |
+----------------+------------------------------------------------------------------------------------------+
| x86intrin.h | Everything, including non-vector x86 instructions like _rdtsc(). |
| mmintrin.h | MMX (Pentium MMX!) |
| mm3dnow.h | 3dnow! (K6-2) (deprecated) |
| xmmintrin.h | SSE + MMX (Pentium 3, Athlon XP) |
| emmintrin.h | SSE2 + SSE + MMX (Pentium 4, Athlon 64) |
| pmmintrin.h | SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Athlon 64 San Diego) |
| tmmintrin.h | SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer) |
| popcntintrin.h | POPCNT (Nehalem (Core i7), Phenom) |
| ammintrin.h | SSE4A + SSE3 + SSE2 + SSE + MMX (AMD-only, starting with Phenom) |
| smmintrin.h | SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Penryn, Bulldozer) |
| nmmintrin.h | SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Nehalem (aka Core i7), Bulldozer) |
| wmmintrin.h | AES (Core i7 Westmere, Bulldozer) |
| immintrin.h | AVX, AVX2, AVX512, all SSE+MMX (except SSE4A and XOP), popcnt, BMI/BMI2, FMA |
+----------------+------------------------------------------------------------------------------------------+
So in general you can just include immintrin.h
to get all Intel extensions, or x86intrin.h
if you want everything, including _bit_scan_forward
and _rdtsc
, as well as all vector intrinsics include AMD-only ones. If you are against including more that you actually need then you can pick the right include by looking at the table.
x86intrin.h
is the recommended way to get intrinsics for AMD XOP (Bulldozer-only, not even future AMD CPUs), rather than having its own header.
Some compilers will still generate error messages if you use intrinsics for instruction-sets you haven't enabled (e.g. _mm_fmadd_ps
without enabling fma, even if you include immintrin.h
and enable AVX2).

- 328,167
- 45
- 605
- 847

- 21,435
- 13
- 113
- 151
-
1`smmintrin` (SSE4.1) is Penryn (45nm Core2), not Nehalem ("i7"). Can we stop using "i7" as an architecture name? [It's meaningless now that Intel has kept using it for SnB-family](http://stackoverflow.com/questions/37361145/deoptimizing-a-program-for-the-pipeline-in-intel-sandybridge-family-cpus/37362225#37362225). – Peter Cordes Jun 03 '16 at 20:05
-
`immintrin.h` doesn't appear to include `_popcnt32` and `_popcnt64` (not to be confused with those in `popcntintrin.h`!) intrinsics on GCC 9.1.0. So it appears `x86intrin.h` still serves a purpose. – Thom Wiggers Aug 27 '19 at 11:12
20200914: latest best practice: <immintrin.h>
(also supported by MSVC)
I'll leave the rest of the answer for historic purposes; it might be useful for older compiler / platform combinations...
As many of the answers and comments have stated, <x86intrin.h>
is the comprehensive header for x86[-64] SIMD intrinsics. It also provides intrinsics supporting instructions for other ISA extensions. gcc
, clang
, and icc
have all settled on this. I needed to do some digging on versions that support the header, and thought it might be useful to list some findings...
gcc : support for
x86intrin.h
first appears ingcc-4.5.0
. Thegcc-4
release series is no longer being maintained, whilegcc-6.x
is the current stable release series.gcc-5
also introduced the__has_include
extension present in allclang-3.x
releases.gcc-7
is in pre-release (regression testing, etc.) and following the current versioning scheme, will be released asgcc-7.1.0
.clang :
x86intrin.h
appears to have been supported for allclang-3.x
releases. The latest stable release isclang (LLVM) 3.9.1
. The development branch isclang (LLVM) 5.0.0
. It's not clear what's happened to the4.x
series.Apple clang : annoyingly, Apple's versioning doesn't correspond with that of the
LLVM
projects. That said, the current release:clang-800.0.42.1
, is based onLLVM 3.9.0
. The firstLLVM 3.0
based version appears to beApple clang 2.1
back inXcode 4.1
.LLVM 3.1
first appears withApple clang 3.1
(a numeric coincidence) inXcode 4.3.3
.
Apple also defines__apple_build_version__
e.g.,8000042
. This seems about the most stable, strictly ascending versioning scheme available. If you don't want to support legacy compilers, make one of these values a minimum requirement.
Any recent version of clang
, including Apple versions, should therefore have no issue with x86intrin.h
. Of course, along with gcc-5
, you can always use the following:
#if defined (__has_include) && (__has_include(<x86intrin.h>))
#include <x86intrin.h>
#else
#error "upgrade your compiler. it's free..."
#endif
One trick you can't really rely on is using the __GNUC__
versions in clang
. The versioning is, for historical reasons, stuck at 4.2.1
. A version that precedes the x86intrin.h
header. It's occasionally useful for, say, simple GNU C extensions that have remained backwards compatible.
icc : as far as I can tell, the
x86intrin.h
header is supported since at least Intel C++ 16.0. The version test can by performed with:#if (__INTEL_COMPILER >= 1600)
. This version (and possibly earlier versions) also provides support for the__has_include
extension.MSVC : It appears that
MSVC++ 12.0 (Visual Studio 2013)
is the first version to provide theintrin.h
header - notx86intrin.h
... this suggests:#if (_MSC_VER >= 1800)
as a version test. Of course, if you're trying to write code that's portable across all these different compilers, the header name on this platform will be the least of your problems.

- 21,653
- 2
- 61
- 90
-
1I'd prefer `__has_builtin` instead of annoying version checks. Also note [GCC still has some bugs on specific buitins](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776) at current; in this case, I'd consider target-specific ones, [even undocumented](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92137). – FrankHB Dec 21 '21 at 15:27