18

Just interesting how it works in games and other software.
More precisely, I'm asking for a solution in C++.
Something like:

if AMX available -> Use AMX version of the math library
else if AVX-512 available -> Use AVX-512 version of the math library
else if AVX-256 available -> Use AVX-256 version of the math library
etc.  

The basic idea I have is to compile the library in different DLLs and swap them on runtime but it seems not to be the best solution for me.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
malatindez
  • 320
  • 1
  • 8
  • 2
    The app will have to ask the CPU what model it is. [Intrinsics for CPUID like informations?](https://stackoverflow.com/questions/17758409/intrinsics-for-cpuid-like-informations) – BoP Sep 01 '22 at 08:20
  • 4
    Look into [function multi versioning](https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html). – Jesper Juhl Sep 01 '22 at 09:00
  • My apologies when I edited the tags I thought I saw VS mentioned, and I was in error adding [visual-c++]. It has been removed. – Michael Petch Sep 01 '22 at 09:59
  • 4
    In many cases applications *don't* do such detection/special handling. They will just define a minimum requirement and then compile the code with the compiler targeting that lowest common denominator and leave it at that. Only if you have a special need where utilizing different instructions makes a huge difference do you need to care (usually). – Jesper Juhl Sep 01 '22 at 10:15
  • 1
    I don't know about C++, but Java uses JIT (just-in-time) optimisation. That is, at run-time it identifies hot-spot areas of code to optimise by translating into machine code, and because this is done at run-time, it can take account of specific characteristics of the hardware it is running on. – Michael Kay Sep 01 '22 at 19:51
  • Challenges with heterogenous multiprocessor systems: not different ISA family processors like x86 and ARM in the same system, but e.g. heterogenous x86 where some might have AVX, and some do not. // AFAIK state-of-the-practice is to insist that the processors be ISA compatible if not microarchitecture compatible, e.g. ARM Big/Little. // IMHO this is stupid: if you can safely and precisely trap illegal instructions you can migrate to the processors that provide them. // OS must decide if CPUID reports the union or the intersection, and migration policy. Validation... – Krazy Glew Sep 07 '22 at 20:33

2 Answers2

12

For the detection part

See Are the xgetbv and CPUID checks sufficient to guarantee AVX2 support? which shows how to detect CPU and OS support for new extensions: cpuid and xgetbv, respectively.

ISA extensions that add new/wider registers that need to be saved/restored on context switch also need to be supported and enabled by the OS, not just the CPU. New instructions like AVX-512 will still fault on a CPU that supports them if the OS hasn't set a control-register bit. (Effectively promising that it knows about them and will save/restore them.) Intel designed things so the failure mode is faulting, not silent corruption of registers on CPU migration, or context switch between two programs using the extension.

Extensions that added new or wider registers are AVX, AVX-512F, and AMX. OSes need to know about them. (AMX is very new, and adds a large amount of state: 8 tile registers T0-T7 of 1KiB each. Apparently OSes need to know about AMX for power-management to work properly.)

OSes don't need to know about AVX2/FMA3 (still YMM0-15), or any of the various AVX-512 extensions which still use k0-k7 and ZMM0-31.

There's no OS-independent way to detect OS support of SSE, but fortunately it's old enough that these days you don't have to. It and SSE2 are baseline for x86-64. Everything up to SSE4.2 uses the same register state (XMM0-15) so OS support for SSE1 is sufficient for user-space to use SSE4.2. SSE1 was new in 1999, with Pentium 3.

Different compilers have different ways of doing CPUID and xgetbv detection. See does gcc's __builtin_cpu_supports check for OS support? - unfortunately no, only CPUID, at least when that was asked. I'd consider that a GCC bug, but IDK if it ever got reported or fixed.


For the optional-use part

Typically setting function pointers to selected versions of some important functions. Inlining through function pointers isn't generally possible, so make sure you choose the boundaries appropriately, like an AVX-512 version of a function that includes a loop, not just a single vector.

GCC's function multi-versioning can automate that for you, transparently compiling multiple versions and hooking some function-pointer setup.

There have been some previous Q&As about this with different compilers, search for "CPU dispatch avx" or something like that, along with other search terms.

See The Effect of Architecture When Using SSE / AVX Intrinisics to understand the difference between GCC/clang's model for intrinsics where you have to enable -march=skylake or whatever, or manually -mavx2, before you can use an intrinsic. vs. MSVC and classic ICC where you could use any intrinsic anywhere, even to emit instructions the compiler wouldn't be able to auto-vectorize with. (Those compilers can't or don't optimize intrinsics much at all, perhaps because that could lead to them getting hoisted out of if(cpu) statements.)

ecm
  • 2,583
  • 4
  • 21
  • 29
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • "OSes don't need to know about AVX2/FMA3 (still YMM0-15)" Can you elaborate? Don't OSes need to know about the upper 128 bits of the YMM registers? Why do OSes need to know about AVX but not AVX2? – Elliot Gorokhovsky Dec 08 '22 at 19:59
  • @ElliotGorokhovsky: Because AVX2 support implies AVX1, and YMM registers were new with AVX1. AVX2 and FMA3 just added more instructions that can do other things to the bits in a YMM register, no new architectural state. – Peter Cordes Dec 08 '22 at 20:04
  • Ok, I guess it's semantics :) I would say that AVX2 does require OS support in the sense that to implement `__builtin_cpu_supports("avx2")` you need to call `xgetbv`. I see your point that given an implementation of `__builtin_cpu_supports("avx")`, you can implement `__builtin_cpu_supports("avx2")` without any additional OS support checks. – Elliot Gorokhovsky Dec 08 '22 at 20:29
  • 1
    @ElliotGorokhovsky: You're still talking about user-space detection of AVX2. An OS kernel literally does not need to check for or know about AVX2 at all for user-space to be able to use it safely, only AVX1 for its context-switch routine to save the full YMM vectors, and to set the bits that `xgetbv` checks for. This is what I meant by OSes not needing to support AVX2. In practice modern OSes do detect features like AVX2 for stuff like software RAID5/RAID6, and to populate things like Linux `/proc/cpuinfo` and APIs for user-space to query CPU features via system calls instead of via `cpuid`. – Peter Cordes Dec 08 '22 at 20:49
8

Windows provides IsProcessorFeaturePresent but AVX support is not on the list.

For more detailed detection you need to ask the CPU directly. On x86 this means the CPUID instruction. Visual C++ provides the __cpuidex intrinsic for this. In your case, function/leaf 1 and check bit 28 in ECX. Wikipedia has a decent article but you really should download the Intel instruction set manual to use as a reference.

Anders
  • 97,548
  • 12
  • 110
  • 164
  • 9
    ISA extensions that add new/wider registers that need to be saved/restored on context switch also need to be supported and enabled by the OS, not just the CPU. AVX-512 instructions will still fault on a CPU that supports them if the OS hasn't set a control-register bit. (Effectively promising that it knows about them and will save/restore them.) For AVX and AVX-512, [there's an OS-independent way to detect this, too](https://stackoverflow.com/q/72522885/224132), via `xgetbv`. For SSE, it's fortunately baseline for x86-64, and everything up to SSE4.2 uses the same register state. – Peter Cordes Sep 01 '22 at 12:17
  • @Peter Would `IsProcessorFeaturePresent(PF_XSAVE_ENABLED)` be helpful for plain AVX detection? – Anders Sep 01 '22 at 14:20
  • I don't know Windows. I don't *think* so; an OS could be using `xsave` for SSE context-switches without knowing about AVX. Unless that feature flag is newer and implies an AVX-aware OS. – Peter Cordes Sep 01 '22 at 14:31
  • MSDN says Win7, the other AVX stuff might be 7.SP1 – Anders Sep 01 '22 at 16:37