Other than inline asm, gcc/clang will never emit an asm instruction that isn't enabled via a command line option, function attribute, or pragma.
If you can't use GCC target options on the command line like a normal person (e.g. -march=native
to enable everything supported by the compile host, and tune for that machine), you can instead use a pragma to set target-specific options. It doesn't work very well, though; it seems #pragma target("arch=skylake")
breaks GCC's immintrin.h
(if you put the pragma before the include), apparently trying to compile AVX512 functions without AVX512 enabled. The GCC docs do show examples of using "arch=core2"
as a function attribute, or even "sse4.1,arch=core2"
You can set target("bmi2,tune=skylake")
though without breaking anything (before or after #include <immintrin.h>
. (unlike arch=
, tune=
doesn't enable ISA extensions, just affects code-gen choices.)
#include <immintrin.h>
// #pragma GCC target ("arch=haswell") // also broken, disables BMI2??
#pragma GCC target ("bmi2,tune=skylake") // this can be after immintrin.h
int foo(unsigned x) {
// unsigned i = 0x6B5; // 11010110101
unsigned i = x;
unsigned n = 4; // ^
unsigned j = _pdep_u32(1u << n, i);
int bitnum = __builtin_ctz(j); // result is bit 7
return bitnum;
}
int constprop() { // check that GCC can still "understand" the intrinsic
return foo(0x6B5);
}
compiles cleanly on Godbolt with only -O3
, no -m
options.
foo:
mov r8d, edi
mov edi, 16
pdep edi, edi, r8d
bsf eax, edi
ret
constprop:
mov eax, 7
ret
If you ever have trouble getting immintrin.h
to define the right wrapper, you can look in GCC's headers to find out how they're defined. e.g. _pdep_u32
is a wrapper for __builtin_ia32_pdep_si
(single integer) and u64 is a wrapper for __builtin_ia32_pdep_di
(double integer).
These __builtin
functions are always defined by GCC even with no headers, but these ia32
builtins can only be used in functions with compatible target options. There isn't a fallback like with __builtin_popcount
for targets that don't support the instruction.
But it seems that for BMI2 instructions at least, _pdep_u32
and _pdep_u64
get defined by GCC's immintrin.h
regardless of command line options (which would define a __BMI2__
CPP macro). Perhaps the reason is this use case: single functions with a target attribute, or a block of functions with a pragma.
The _pdep_u64
version requires BMI2 + x86-64 while the 32-bit version only requires BMI2 (i.e. is available in 32-bit mode as well). pdep
is a scalar integer instruction so 64-bit operand-size requires 64-bit mode.
Aside: MSVC compiled the instrinsic without fuss
Yeah, MSVC and GCC/clang have different philosophies about intrinsics. MSVC lets you use whatever, under the assumption that you'll do runtime dispatching to make sure the execution never reaches an intrinsic that isn't supported on this machine.
This might be related to MSVC essentially not optimizing intrinsics at all, often not even doing simple constant-propagation through them. Because it's trying to make sure the corresponding asm instruction never executes when the intrinsic isn't reached in the C++ abstract machine, I guess, and takes the slow and safe approach.
but the program crashed (I suppose _pdep_u32
is not supported by my cpu as suggested in the linked question).
Yup, not all CPUs have BMI2. Nothing you can do with GCC will change that.