12

I am interested in using the SSE vector instructions of x86-64 with gcc and don't want to use any inline assembly for that. Is there a way I can do that in C? If so, can someone give me an example?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
pythonic
  • 20,589
  • 43
  • 136
  • 219

3 Answers3

18

Yes, you can use the intrinsics in the *mmintrin.h headers (emmintrin.h, xmmintrin.h, etc, depending on what level of SSE you want to use). This is generally preferable to using assembler for many reasons.

#include <emmintrin.h>

int main(void)
{
    __m128i a = _mm_set_epi32(4, 3, 2, 1);
    __m128i b = _mm_set_epi32(7, 6, 5, 4);
    __m128i c = _mm_add_epi32(a, b);

    // ...
    
    return 0;
}

Note that this approach works for most x86 and x86-64 compilers on various platforms, e.g. gcc, clang and Intel's ICC on Linux/Mac OS X/Windows and even Microsoft's Visual C/C++ (Windows only, of course).

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 5
    Both gcc and VC++ support the intrinsics. – Igor ostrovsky Apr 25 '12 at 07:29
  • FTW, icc supports these intrinsics too – hroptatyr Apr 25 '12 at 07:30
  • Thanks - I've added a note to the answer stating that this approach is supported by most x86 C/C++ compilers. – Paul R Apr 25 '12 at 10:01
  • 9
    @PaulR Even better is to include `x86intrin.h`, which pulls in all MMX/SSE/AVX and some stuff like `bswap` or `ror`, makes them available as the intrinsic functions and sets `__SSEX__` preprocessor macros according to the architecture or compiler flags given. – Gunther Piez Apr 25 '12 at 10:27
  • Is this an abstraction for ARM or x86? – enthusiasticgeek May 28 '13 at 20:30
  • @enthusiasticgeek: the question and answers are all x86/SSE-specific. ARM has a different SIMD ISA (NEON) and different intrinsics. – Paul R May 28 '13 at 20:57
  • 1
    @GuntherPiez: `x86intrin.h` is not portable to MSVC, only GCC / clang and I think ICC. The Intel-defined `immintrin.h` is portable across all mainstream x86 compilers that defines every Intel SIMD intrinsic. Also, `__SSEx__` / `__AVX__` / etc. macros are pre-defined by the compiler itself, regardless of headers. That's how the headers know which intrinsic "functions" to define. `x86intrin.h` makes your compile times slower, which is another reason not to use if it you don't actually need it. – Peter Cordes Sep 04 '20 at 07:25
6

Find the *intrin.h headers in your gcc includes (/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/include/ here).

Maybe noteworthy, the header immintrin.h includes all other intrins according to the features you allow (using -msse2 or -mavx for instance).

hroptatyr
  • 4,702
  • 1
  • 35
  • 38
  • Generally prefer `-march=haswell` or something, rather than manual `-mavx2`. The "generic" tuning options are not great for 256-bit vectors on Intel CPUs: [Why doesn't gcc resolve \_mm256\_loadu\_pd as single vmovupd?](https://stackoverflow.com/q/52626726) – Peter Cordes Sep 04 '20 at 07:27
5

What you want are intrinsics, which look like library functions but are actually built into the compiler so they translate into specific machine code.

Paul R and hroptatyr describe where to find GCC's documentation. Microsoft also has good documentation on the intrinsics in their compiler; even if you are using GCC, you might find MS' description of the idea a better tutorial.

Crashworks
  • 40,496
  • 12
  • 101
  • 170