3

I have an image drawing routine which is compiled multiple times for SSE, SSE2, SSE3, SSE4.1, SSE4.2, AVX and AVX2. My program dynamically dispatches one of these binary variations by checking CPUID flags.

On Windows, I check the version of Windows and disable AVX/AVX2 dispatch if the OS doesn't support them. (For example, only Windows 7 SP1 or later supports AVX/AVX2.)

I want to do the same thing on Mac OS X, but I'm not sure what version of OS X supports AVX/AVX2.

Note that what I want to know is the minimum version of OS X for use with AVX/AVX2. Not machine models which are capable of AVX/AVX2.

TABATA
  • 69
  • 1
  • 9
  • 1
    You may find this command in Terminal/shell to be useful `sysctl -a | grep -i avx` to see CPU features. – Mark Setchell Jul 12 '16 at 07:58
  • https://stackoverflow.com/questions/6121792/how-to-check-if-a-cpu-supports-the-sse3-instruction-set/22521619#22521619 – Z boson Jul 13 '16 at 07:59

2 Answers2

5

For detecting instruction set features there are two source files I reference:

  1. Mysticial's cpu_x86.cpp
  2. Agner Fog's instrset_detect.cpp

Both of these files will tell you how to detect SSE through AVX2 as well as XOP, FMA3, FMA4, if your OS supports AVX and other features.

I am used to Agner's code (one source file for MSVC, GCC, Clang, ICC) so let's look at that first.

Here are the relevant code fragments from instrset_detect.cpp for detecting AVX:

iset = 0;                                              // default value
int abcd[4] = {0,0,0,0};                               // cpuid results
cpuid(abcd, 0);                                        // call cpuid function 0
//....
iset = 6;                                              // 6: SSE4.2 supported
if ((abcd[2] & (1 << 27)) == 0) return iset;           // no OSXSAVE
if ((xgetbv(0) & 6) != 6)       return iset;           // AVX not enabled in O.S.
if ((abcd[2] & (1 << 28)) == 0) return iset;           // no AVX
iset = 7;                                              // 7: AVX supported

with xgetbv defined as

// Define interface to xgetbv instruction
static inline int64_t xgetbv (int ctr) {    
#if (defined (_MSC_FULL_VER) && _MSC_FULL_VER >= 160040000) || (defined (__INTEL_COMPILER) && __INTEL_COMPILER >= 1200) // Microsoft or Intel compiler supporting _xgetbv intrinsic

    return _xgetbv(ctr);                                   // intrinsic function for XGETBV

#elif defined(__GNUC__)                                    // use inline assembly, Gnu/AT&T syntax

   uint32_t a, d;
   __asm("xgetbv" : "=a"(a),"=d"(d) : "c"(ctr) : );
   return a | (uint64_t(d) << 32);

#else  // #elif defined (_WIN32)                           // other compiler. try inline assembly with masm/intel/MS syntax

  //see the source file
}

I did not include the cpuid function (see the source code) and I removed the non GCC inline assembly from xgetbv to make the answer shorter.

Here is the detect_OS_AVX() from Mysticial's cpu_x86.cpp for detecting AVX:

bool cpu_x86::detect_OS_AVX(){
    //  Copied from: http://stackoverflow.com/a/22521619/922184

    bool avxSupported = false;

    int cpuInfo[4];
    cpuid(cpuInfo, 1);

    bool osUsesXSAVE_XRSTORE = (cpuInfo[2] & (1 << 27)) != 0;
    bool cpuAVXSuport = (cpuInfo[2] & (1 << 28)) != 0;

    if (osUsesXSAVE_XRSTORE && cpuAVXSuport)
    {
        uint64_t xcrFeatureMask = xgetbv(_XCR_XFEATURE_ENABLED_MASK);
        avxSupported = (xcrFeatureMask & 0x6) == 0x6;
    }

    return avxSupported;
}

Mystical apparently came up with this solution from this answer.

Notice that both source files do basically the same thing: check the OSXSAVE bit 27, check the AVX bit 28 from CPUID, check a result from xgetbv.

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • Thank you very much for kind explanation. I understood very well. Now I know I had a bad idea. The correct way is like "Step1: Get OSXSAVE bit to check if OS allows XGETBV instruction. Step2: Get AVX/AVX2 bits. Step3: Issue XGETBV and check if OS saves YMM registers." However, I guess the answer for my question is "Mac OS X 10.6.7" :-) – TABATA Jul 13 '16 at 09:10
  • @TABATA, did you want a computer answer or a human answer? Computer answers lead to crashes, literally. Several Tesla's have crashed recently due to computer answers. I don't think the restrictive filter of SO should lead to a higher rate of people failing the Turing test... – Z boson Jul 13 '16 at 11:27
  • 1
    Surely. I've already implemented the way you told me. Thanks again. I should have simply asked the steps to check AVX capability, not a OS version. I'll try writing a simple question next time. – TABATA Jul 13 '16 at 14:36
  • @TABATA: you can edit your question to take out the mistaken idea of looking at OS version, and just ask how to detect if AVX / AVX2 is usable in a program. Now that you know the answer isn't OS-specific, you can tidy up the question even more if you want to. But it might not hurt to leave some mention of Windows and OS X in there, since people searching in the future might have the same idea as you, and be looking for the Windows way or the OS X way. – Peter Cordes Jul 13 '16 at 19:59
3

For AVX the answer is quite straightforward:

You need at least OS X 10.6.7

Please note that only build 10J3250 and 10J4138 would support it.

For AVX2 that would be 10.8.4 build 12E3067 or 12E4022

Antzi
  • 12,831
  • 7
  • 48
  • 74
  • That's exactly what I wanted to know. Thank you very much, Antzi! – TABATA Jul 12 '16 at 04:01
  • AVX2 doesn't require any extra OS support beyond AVX. If you have OS support for saving/restoring the full ymm registers on context switches, you can use AVX2 if the CPU supports it. (Check the feature bits in CPUID). – Peter Cordes Jul 12 '16 at 05:48
  • CPUID results even contain a flag that indicates that the OS indicated that it supports AVX, so you can just check that (https://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled), unless OS X has a bug or something where it sets the bit but doesn't actually use XSAVE/XRSTOR to save vector state on context switches. – Peter Cordes Jul 12 '16 at 05:51
  • Thank you for reply! It is my understanding that CPUID flag for AVX and its OS support (context switching) are independent. Is it right? My question is, what is the minimum version of OS X which saves YMM registers? – TABATA Jul 12 '16 at 09:45
  • @TABATA the versions numbers from my answer are the minimum support version for the CPUs that support AVX. If the computer runs on an earlier version there is a 100% chance you won't have AVX support. Of course running on a later version does not mean you'll have AVX support, for that, check the flags. – Antzi Jul 12 '16 at 09:49
  • 1
    @TABATA, cutting on OS version I don't think is the right way to go about this. There are methods to ask if the OS supports AVX which are cross platform. Relying on OS version cuts is not the ideal solution. – Z boson Jul 12 '16 at 12:42
  • @TABATA Checking the OS version is definitely the wrong way to check for AVX support. For example, there's an option in Windows to disable AVX support. So if you only check the OS version, your app will crash in such cases. (This is actually quite common among enthusiasts. They often disable AVX for thermal reasons.) What you need to do is check the XSAVE bit - and there's an OS-independent way to do that. – Mysticial Jul 12 '16 at 15:43
  • @Mysticial "there's an option in Windows to disable AVX support" Oh, I didn't know. If so, is there a situation that a CPU has XSAVE bit and an OS doesn't save YMM registers? I'm a little bit confused. – TABATA Jul 13 '16 at 00:25
  • @PeterCordes I understand. I should do something like
    uint32_t a, b, c, d;
    asm("cpuid": "=b"(b), "=c"(c), "=d"(d): "a"(1));
    has_avx = (c & (1 << 28)) ? true : false;
    if(has_avx) {
        asm("xgetbv": "=a"(a), "=b"(b) : "c"(0));
        if ((a & 6) != 6)
        has_avx = false;
    }
    
    – TABATA Jul 13 '16 at 02:52
  • @TABATA now that's the scope for a different question :) – Antzi Jul 13 '16 at 03:34
  • @Mysticial how do you disable AVX in Windows? I have not tried this. Do you mean [bcdedit /set xsavedisable 1](http://superuser.com/a/623738)? Why would you do this? For max overclocking testing? – Z boson Jul 13 '16 at 07:13
  • 1
    @Zboson Yes. I only do it for testing, but many of my users do it for everyday use. AVX heavily taxes Haswell processors to the point that can overheat (100C+) even at stock settings. Throw in a large overclock and it's a recipe for disaster. Try googling for "Haswell AVX overclock". The situation is so bad that many overclockers simply disable AVX via bcdedit. And those that do start reporting crashes with specific apps. (Namely those that attempt to use AVX after seeing the AVX bit in cpuid, but without also checking the xsave bit.) – Mysticial Jul 13 '16 at 14:50
  • Even MSVC 2013 was hit by this when their `` implementation tried to use FMA3 instructions without checking xsave. – Mysticial Jul 13 '16 at 14:51
  • 1
    @Zboson As an update to my comments. Kaby Lake lets you set an "AVX offset" that will drop the CPU multiplier when running AVX. This feature was already present in Haswell-EP and Broadwell-EP. Now you can control it for Kaby Lake with a suitable (overclockable) motheraboard/BIOS. – Mysticial Feb 17 '17 at 01:23
  • @Mysticial, are you referring to the frequency downgrade using AVX on some Intel x86 processors (e.g. 18 core broadwell Xeon)? I know some of the KNL systems suffer from this. The term I have seen is `AVX-P1`. With Kaby Lake are you saying that the possibility exists to set this at POST through the BIOS? Do you have a link that discusses any of this? – Z boson Feb 17 '17 at 08:20
  • 1
    @Zboson Yes. Pretty much every single Kaby Lake overclocking guide covers it. Just take a look at http://edgeup.asus.com/2017/01/31/kaby-lake-overclocking-guide/ and scroll down to the section, "AVX offset". – Mysticial Feb 17 '17 at 16:17