8

In one of my applications, I need to efficiently de-interleave bits in a long stream of data. Ideally, I would like to use the BMI2 pext_u32() and/or pext_u64() x86_64 intrinsic instructions when available. I scoured the internet for doc on x86intrin.h (GCC), but couldn't find much on the subject; so, I am asking the gurus on StackOverflow to help me out.

  1. Where can I find documentation about how to work with functions in x86intrin.h?
  2. Does gcc's implementation of pext_*() already have code behind it to fall back on, or do I need to write the fallback code myself (for conditional compile)?
  3. Is it possible to write a binary that automatically falls back to an alternate implementation if a target does not support the intrinsic? If so, how does one do so?
  4. Is there a known programming pattern that will be recognized by GCC and automatically converted to pext_*() when compiling with optimization enabled and with -mbmi2?
Michael Back
  • 1,821
  • 1
  • 16
  • 17

2 Answers2

6

Intel publishes the Intrinsics Guide, which also applies to GCC. You will have to write your own fallback code if you use these intrinsics.

You can achieve automatic switching of implementations by using IFUNC resolvers, but for non-library code, using conditionals or function pointers is probably simpler.

Looking at the gcc/config/i386/i386.md and gcc/config/i386/i386.c files, I don't see anything in GCC 8 which would automatically select the pext instruction without intrinsics in the source code.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • Some things like the target_clones attribute can create the ifunc resolver for you, though it may not be so convenient for this particular case. – Marc Glisse Apr 02 '18 at 10:21
  • Yes, target clones only work if the code itself is architecture-independent. This is not the case when CPU-specific intrinsics are used. – Florian Weimer Apr 02 '18 at 10:31
  • @Florian, thank you for your excellent input - and this has answered almost all of my question. Will you add to your answer if you know about any source that I could find about the opcodes that gcc supports and will attempt as an optimization target? – Michael Back Apr 02 '18 at 18:44
  • Well, you need to look at `i386.md` and `i366.c` for this information. There is no other neat source. – Florian Weimer Apr 02 '18 at 19:31
  • @Florian, I am actually building 64-bit and 32-bit versions of the same DLL with mingw (using `_pext*()` as an option on the 64-bit version) -- cross compiling from Linux to Windows (if that also makes a difference). Do ifunc resolvers work on Windows DLL's (both 32-bit, and 64-bit)? – Michael Back Apr 05 '18 at 20:27
  • 2
    Sorry, no IFUNC resolvers are a GNU/ELF feature. Sorry. – Florian Weimer Apr 05 '18 at 20:31
3

The design philosophy of Intel's intrinsics is that you can only use them in functions that will run only on CPUs with the required extensions. Checking for support every instruction would add way too much overhead, and then there's have to be a fallback (there isn't).

Intel intrinsics are not like GNU C __builtin_popcountll (which does use a fallback if compiled without -mpopcnt, but not you can enable target options on a per-function basis with attributes.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847