-1

Im trying to run code similar to the following

#include <immintrin.h>
void foo() {
    __m128i a = _mm_set_epi8 (0,0,6,5,4,3,2,1,8,7,6,5,4,3,2,1);
    __m128i b = _mm_set_epi8 (0,0,0,0,0,0,0,1,8,7,6,5,4,3,2,1);
    __mmask16 m = _mm_cmpeq_epi8_mask(a,b); // supposedly requires avx512vl and avx512bw
    std::cout<<m<<std::endl;
}
void bar() {
    int dataa[8] = {1,0,1,0,1,0,1,0};
    __m256i points = _mm256_lddqu_si256((__m256i *)&dataa[0]); // requires just mavx
    (void)points;
}

However, I keep running into the error Illegal instruction (core dumped)

I compile the code with

g++ -std=c++11 -march=broadwell -mavx -mavx512vl -mavx512bw tests.cpp

According to Intel's intrinsics documentation, these flags should be sufficient to run both foo and bar. However, when either foo or bar is run, I get the same error message.

If I remove foo, however, and compile WITHOUT -mavx512vl, I can run bar smoothly.

I already checked that my cpu supports the mno-avx512vl and mno-avx512bw flags so it should support mavx512vl and mavx512bw right?

What flags must I include to run both functions? Or am I missing something else?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847

3 Answers3

3

I'm afraid your method to determine the CPU capabilities is not very reliable. The fact that your gcc compiler supports AVX-512 doesn't imply that your CPU supports AVX-512.

On the Linux command line type more /proc/cpuinfo and check the flags section to see which instruction sets are supported by your CPU.

On windows: 1. Open settings, 2. Click on System, 3. Click on About. This will show you the processor type. Google intel ark 'processor type' for example Google intel ark core i3 7100. Then follow the link to the processor page on the Intel website and check the Advanced Technologies -> Instruction Set Extensions item.

There are many levels of AVX-512 support. AVX-512_BW AVX-512_VL are standard on processors with AVX-512 support, unless you are working with a Knights Landing or Mill processor. See https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 or https://en.wikichip.org/wiki/x86/avx-512#Implementation.

wim
  • 3,702
  • 19
  • 23
2

Compile with gcc -march=native. If you get compile errors, your source tried to use something your CPU doesn't support.

Related: Getting Illegal Instruction while running a basic Avx512 code


I already checked that my cpu supports the mno-avx512vl and mno-avx512bw flags so it should support mavx512vl and mavx512bw right?

That's the opposite of how GCC options work.

-mno-avx512vl disables -mavx512vl if any earlier option (like -march=skylake-avx512 or -mavx512vl on its own) had set it.

-march=broadwell doesn't enable AVX512 instructions because Broadwell CPUs can't run them natively. So -mno-avx512vl has exactly zero effect at the end of g++ -std=c++11 -march=broadwell -mavx ...

Many options have long names starting with ‘-f’ or with ‘-W’—for example, -fmove-loop-invariants, -Wformat and so on. Most of these have both positive and negative forms; the negative form of -ffoo is -fno-foo. This manual documents only one of these two forms, whichever one is not the default.

from the GCC manual, intro part of section 3: Invoking GCC 3

(-m options follow the same convention as -f and -W long options.)

This style of foo vs. no-foo is not unique to GCC; it's pretty common.


Faulting on _mm256_lddqu_si256 after compiling with -mavx512vl

GCC is dumb and uses an EVEX encoding for the load (probably vmovdqu64) instead of a shorter VEX encoding. But you told it AVX512VL was available, so this is only an optimization problem, not correctness.

If you did compile the function with only AVX enabled, it would of course only use AVX instructions.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

For Intel's ISAs the general rule is that the latter architecture is a superset of the former. As AVX512 is the latest of the ones you mentioned you don't have to use -mavx. Using of -march=broadwell is useless as you cannot optimize for a CPU that doesn't have AVX512 ISA.

Your command line should look like

g++ -std=c++11 -march=skylake-avx512 tests.cpp

Also, the statement "my CPU supports those compiler flags" is odd. I presume you mean "the code I built with those flags runs on my CPU" but as it was already mentioned no prefix means do NOT generate code for such an ISA.

So, your compiler flags are fine it is the CPU you have doesn't have support for the required ISA.

Anton
  • 86
  • 4
  • 1
    `-march=broadwell` is never useless, you should prefer using a `-march` option to set tuning options (`-mtune=broadwell`) *and* enable other instruction set extensions like popcnt, BMI1/BMI2, CMPXCHG16B, and other non-SIMD extensions that the compiler might take advantage of. There are very few reasons to use `-mavx512bw` instead of `-march=skylake-avx512`. It also would have avoided this mistake because the OP knows they don't have a Skylake-X CPU. Other than that, good answer. – Peter Cordes Jun 26 '19 at 15:14
  • OK, there are two scenarios: either `-march=broadwell` wins and there are no AVX512 instructions or `-mavx512*` wins and AVX512 instructions get generated. In the first case the OP doesn't have the desired instruction emitted. In the second case gcc would be optimizing for a chip that doesn't actually exists. I have to correct myself the option is not useless but using it can lead to unexpected consequences performance-wise. – Anton Jun 27 '19 at 10:05
  • 1
    @PeterCordes But of course the approach of using an appropriate `-march` instead of number of -m's is the preferred one. So, yes, `-march=skylake-avx512` is pretty much sufficient for what OP is doing. – Anton Jun 27 '19 at 10:23
  • 1
    In this specific case, tune=broadwell is better than tune=generic for every existing CPU that supports AVX512BW + AVX512VL. Because [currently those are all Skylake-server derived](https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512), and tune=skylake-avx512 is pretty close to tune=broadwell, and rules out tuning for any AMD CPUs. – Peter Cordes Jun 27 '19 at 17:45
  • 1
    And no, it's not a matter of `-march=broadwell` "winning" vs. `-mavx512vl`. They're additive. `-march=broadwell` enables basically every Intel extension before AVX512, and `-mavx512vl` enables AVX512F + AVX512VL. Only if you'd used `-mno-avx2` or something would it matter which one was last. There are still so few AVX512 CPUs that it doesn't make sense to recommend tune=generic for compiling any AVX512 code; always use either `-march=skylake-avx512` or `-march=knl` (or `knm`). So you should edit your answer to not recommend using `-mavx512vl -mavx512bw` by themselves; that's silly. – Peter Cordes Jun 27 '19 at 17:49
  • @PeterCordes yes, you are right about `-march=skylake-avx512` being better than passing specific ISA extensions. I fixed the answer. Thanks! The "winning" thing was indeed uncalled for because even though some compilers would do "right most flag wins" thing but since this is gcc in question such a concern is irrelevant. I overlooked the fact that `-m` by itself defaults to `-tune=generic -march=x86-64` (at least on 64-bit OS) and, of course, tuning for broadwell makes it definitely closer to any actual AVX512-enabled chip than tuning for generic. – Anton Jun 27 '19 at 19:48