83

I'm trying to optimize some matrix computations and I was wondering if it was possible to detect at compile-time if SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI[1] is enabled by the compiler ? Ideally for GCC and Clang, but I can manage with only one of them.

I'm not sure it is possible and perhaps I will use my own macro, but I'd prefer detecting it rather and asking the user to select it.


[1] "KCVI" stands for Knights Corner Vector Instruction optimizations. Libraries like FFTW detect/utilize these newer instruction optimizations.

Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177
Baptiste Wicht
  • 7,472
  • 7
  • 45
  • 110
  • 4
    What exactly do you want to test for? Do you want to test that the compiler will produce AVX instructions? It is important to keep in mind that just because the compiler is ready to produce them does not mean that the CPU your program will eventually run will also support it (even if both compilation and execution happen on the same machine). – ArjunShankar Mar 09 '15 at 10:33
  • 1
    @ArjunShankar I want to know if for instance avx was enabled during compilation with -mavx. – Baptiste Wicht Mar 09 '15 at 13:38
  • 4
    Also, note that CPU support and OS support are two different things. The CPU may support SSE, but the OS may not support SSE (which requires the OS to save XMM registers during a context switch). See, for example, [Checking for SSE](http://wiki.osdev.org/SSE#Checking_for_SSE) on OSDev wiki. – jww Aug 21 '15 at 15:03

2 Answers2

122

Most compilers will automatically define:

__SSE__
__SSE2__
__SSE3__
__AVX__
__AVX2__

etc, according to whatever command line switches you are passing. You can easily check this with gcc (or gcc-compatible compilers such as clang), like this:

$ gcc -msse3 -dM -E - < /dev/null | egrep "SSE|AVX" | sort
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE_MATH__ 1

or:

$ gcc -mavx2 -dM -E - < /dev/null | egrep "SSE|AVX" | sort
#define __AVX__ 1
#define __AVX2__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1

or to just check the pre-defined macros for a default build on your particular platform:

$ gcc -dM -E - < /dev/null | egrep "SSE|AVX" | sort
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1

More recent Intel processors support AVX-512, which is not a monolithic instruction set. One can see the support available from GCC (version 6.2) for two examples below.

Here is Knights Landing:

$ gcc -march=knl -dM -E - < /dev/null | egrep "SSE|AVX" | sort
#define __AVX__ 1
#define __AVX2__ 1
#define __AVX512CD__ 1
#define __AVX512ER__ 1
#define __AVX512F__ 1
#define __AVX512PF__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1

Here is Skylake AVX-512:

$ gcc -march=skylake-avx512 -dM -E - < /dev/null | egrep "SSE|AVX" | sort
#define __AVX__ 1
#define __AVX2__ 1
#define __AVX512BW__ 1
#define __AVX512CD__ 1
#define __AVX512DQ__ 1
#define __AVX512F__ 1
#define __AVX512VL__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1

Intel has disclosed additional AVX-512 subsets (see ISA extensions). GCC (version 7) supports compiler flags and preprocessor symbols associated with the 4FMAPS, 4VNNIW, IFMA, VBMI and VPOPCNTDQ subsets of AVX-512:

for i in 4fmaps 4vnniw ifma vbmi vpopcntdq ; do echo "==== $i ====" ; gcc -mavx512$i -dM -E - < /dev/null | egrep "AVX512" | sort ; done
==== 4fmaps ====
#define __AVX5124FMAPS__ 1
#define __AVX512F__ 1
==== 4vnniw ====
#define __AVX5124VNNIW__ 1
#define __AVX512F__ 1
==== ifma ====
#define __AVX512F__ 1
#define __AVX512IFMA__ 1
==== vbmi ====
#define __AVX512BW__ 1
#define __AVX512F__ 1
#define __AVX512VBMI__ 1
==== vpopcntdq ====
#define __AVX512F__ 1
#define __AVX512VPOPCNTDQ__ 1

Note that the SSE macros won't work with Visual C++. You have to use _M_IX86_FP instead.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 2
    Note that the SSE macros won't work with Visual C++. You have to use _M_IX86_FP instead: https://msdn.microsoft.com/en-us/library/b0084kay.aspx – Rémi Mar 29 '16 at 07:54
  • 2
    @Rémi: yes, typical I'm afraid - the easiest thing is to just define the SSE macros in your project or makefile if you are forced to support MSVC. – Paul R Mar 29 '16 at 08:24
  • 3
    I think the last one needs `-march=native`... Also worth noting: GCC defines the individual AVX512 subsets (e.g. `__AVX512F__` and `__AVX512BW__`). – ZachB Feb 28 '17 at 20:28
  • @ZachB: I think it depends on your gcc installation, but the above output was taken directly from whatever toolchain I had installed back in 2015. If I try the last one with `-march=native` I get the same as for `-mavx2` (on a Haswell machine). Thanks for the info on AVX512 though - I'll update the answer in due course... – Paul R Feb 28 '17 at 22:11
  • 1
    @PaulR I hope you don't mind, but I added all of the publicly documented AVX-512 information. #IamIntel – Jeff Hammond Mar 18 '17 at 00:29
  • 1
    For the latest macPro 2019 its : cascadelake instead of skylake-avx512 with AVX512VNNI added. – Stephane Mar 28 '20 at 10:22
1

Take a look at archspec, a library which was built exactly for this purpose: https://github.com/archspec/archspec

Kenneth Hoste
  • 2,816
  • 20
  • 14