In one of my applications, I need to efficiently de-interleave bits in a long stream of data. Ideally, I would like to use the BMI2 pext_u32()
and/or pext_u64()
x86_64 intrinsic instructions when available. I scoured the internet for doc on x86intrin.h
(GCC), but couldn't find much on the subject; so, I am asking the gurus on StackOverflow to help me out.
- Where can I find documentation about how to work with functions in
x86intrin.h
? - Does gcc's implementation of
pext_*()
already have code behind it to fall back on, or do I need to write the fallback code myself (for conditional compile)? - Is it possible to write a binary that automatically falls back to an alternate implementation if a target does not support the intrinsic? If so, how does one do so?
- Is there a known programming pattern that will be recognized by GCC and automatically converted to
pext_*()
when compiling with optimization enabled and with-mbmi2
?