0

In the following link there is a section for non-simd intel intrinsics: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

These include assembly instructions like bsf and bsr. For SIMD instructions I can copy the c function and run it after including the proper header.

For the non-simd functions, like _bit_scan_reverse (bsr), I get that this function is undefined for gcc (implicit definition). GCC has similar "builtin functions" e.g. __builtin_ctz, but no _bit_scan_reverse or _mm_popcnt_u32. Why are these intrinsics not available?

#include <stdio.h>
#include <immintrin.h>

int main(void) {
  int x = 5;
  int y = _bit_scan_reverse (x);
  printf("%d\n",y);
  return 0;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jimbo
  • 2,886
  • 2
  • 29
  • 45
  • Step 1 is that it seems I didn't activate the correct compile options. I had avx2 enabled `-mavx2` but switched to `-march=native`. Now however I get: `"Undefined symbols for architecture x86_64: "__bit_scan_reverse"` – Jimbo Oct 20 '18 at 04:29
  • 3
    I would try `x86intrin.h` instead of `immintrin.h`. – wim Oct 20 '18 at 08:50
  • That fixes the problem but doesn't really get at the crux of the question. For all the other instructions, including the specified include makes everything work. Why is that not the case here? This link mostly covers the answer: https://stackoverflow.com/questions/11228855/header-files-for-x86-simd-intrinsics although it doesn't state why going by the intel documentation doesn't work or how one might target these "other" instructions other than by going native. I feel like I'm still missing some information that makes this all make sense in the big picture. – Jimbo Oct 20 '18 at 12:39

1 Answers1

2

It appears that I needed to have two changes:

First, it appears to be best practice to include x86intrin.h rather than more specific includes. This appears to be compiler specific and is covered in much better detail in:

Header files for x86 SIMD intrinsics

Importantly, you would have a different include if not using gcc.

Second, compiler options also need to be enabled. For gcc these are detailed in:

https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

Although documentation for many flags are lacking.

As my goal is to distribute a compiled binary, I wanted to try and avoid -march=native

Most of the "other" intrinsics I'm interested in are bit manipulation related. Ye Olde Wikipedia has a decent writeup of important bit manipulation intrinsic groups like bmi2: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets

I need bmi2 for BZHI (instruction) or _bzhi_u32 (c)

Thus I can get what I want with something like:

-mavx2 -mbmi2

Using -mbmi2 seems to be sufficient to get things like bmi1 and abm (see linked Wikipedia page for definitions) although I don't see any mention of this in the linked gcc page so I might be wrong about this ... EDIT: It seems like adding bmi2 support does not add bmi1 and abm, I might have been using a __builtin call.... I later needed to add -mabm and -mbmi explicitly to get the instructions I wanted. As Peter Cordes suggested it is probably better to target Haswell -march=haswell as a starting point and then add on additional flags as needed. Haswell is the first processor with AVX2 from 2013 so in my mind -march=haswell is basically saying, I expect that you have a computer from 2013 or newer.

Also, based on some quick reading, it sounds like the use of __builtin enables the necessary flags (a future question for SO), although there does not appear to be a 1:1 correspondence between intrinsics and builtins. More specifically, not all intrinsics seem to be included as builtins, meaning the flag setting approach seems to be necessary, rather than just always using builtins and not worrying about setting flags. Also it is useful to know what intrinsics are being used, for distribution purposes, as it seems like bmi2 could still be missing on a substantial portion of computers (e.g. needing AMD from 2015+ - I think).

It's still not clear to me why just using the specified include in the Intel documentation doesn't work, but this info get's me 99% of the way to where I want to be.

Jimbo
  • 2,886
  • 2
  • 29
  • 45
  • 1
    Prefer using `-march=haswell` instead of just `-mavx2 -mbmi2`. That will tune for recent Intel CPUs, as well as enabling instruction sets. (Unfortunately there isn't a generic-avx2 tuning option that cares about Ryzen as well, but many things like macro-fusion of cmp/jcc is good on Ryzen as well as Haswell. Unfortunately BMI2 `pdep`/`pext` are very slow on Ryzen, but it has single-uop `bzhi` and other BMI/BMI2 instructions.) – Peter Cordes Oct 22 '18 at 02:14
  • @PeterCordes Thanks, that strategy makes a lot of sense and was basically what I was thinking in my mind anyway (i.e. target Haswell or newer). I've updated my answer based on your suggestion. Thanks again. – Jimbo Oct 22 '18 at 12:14
  • *It's still not clear to me why just using the specified include in the Intel documentation doesn't work* Unlike MSVC and maybe ICC, gcc and clang will only emit instructions if you tell the compiler they're supported by the target. So other than inline asm, you can't write code that uses instructions via intrinsics that the compiler couldn't use on its own when auto-vectorizing / optimizing. That's just how gcc and LLVM are designed. But MSVC is designed for a build-once run-everywhere model where you only use new instructions with runtime dispatch, not by letting the compiler use them. – Peter Cordes Oct 22 '18 at 18:30
  • @PeterCordes In the end I was unable to get the linker working with `-march=native` and just including ``. The linker error was "Undefined symbols for architecture x86_64: "__bit_scan_reverse" ..." Switching to include `x86intrin.h` fixed the problem. There doesn't seem to be any harm in doing `x86intrin.h` so I'm using that, but I don't understand why `immintrin.h` is not sufficient. – Jimbo Oct 23 '18 at 00:14
  • Because gcc doesn't include non-SIMD intrinsics via its `immintrin.h`, so you simply tried to call an undefined / undeclared function and ignored the warning about it. (Would be a compile-time error in C++). I don't know why gcc/clang chose not to define everything that Intel's documentation says it includes, but they don't (e.g. [`_lrotl`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=rotat). You need `x86intrin.h` for gcc/clang/ICC, or `intrin.h` for MSVC for some Intel intrinsics (and for some non-standard intrinsics which Intel doesn't document as immintrin.h) – Peter Cordes Oct 23 '18 at 00:22