2

I'm looking at some library code that performs the following. The CpuId function operates as expected. It loads EAX (function), ECX (subfunction) and then calls CPUID.

struct CPUIDinfo
{
    word32 EAX;
    word32 EBX;
    word32 ECX;
    word32 EDX;
};
...

CPUIDinfo info;
CpuId(1 /*EAX=1*/, 0 /*ECX=0*, info);

if ((info.EDX & (1 << 26)) != 0)
    s_hasSSE2 = TrySSE2();

Then, this is what the code does in TrySSE2:

bool TrySSE2()
{
    /* SIG handlers in place */

    // Sets XMM0 to 0
    por xmm0, xmm0;

    #if ... Microsoft and instrinsics available ...
      // Exercises MOVD instruction
      word32 x = _mm_cvtsi128_si32(xmm0);
      return x == 0;
    #endif

    return true;
}

Calling CPUID and testing bit 26 of EDX is correct per Intel® 64 and IA-32 Architectures Software Developer Manual, Volume 2, Figure 3-8, page 3-192. So I'm not sure about the TrySSE2 part...

I have looked at other similar questions, like Determine processor support for SSE2?. None of them say testing EDX:26 is unreliable.

Why would the code call TrySSE2 rather than using CPUID/EDX:26? Is the test unreliable on some non-Intel processors?

Community
  • 1
  • 1
jww
  • 97,681
  • 90
  • 411
  • 885

1 Answers1

2

When the SSE instructions were added, they introduced new registers which need to be saved/restored during context switches... since OSes at the time didn't have the code to do this, the SSE instructions were disabled by default.

Once OSes were updated to support saving/restoring these new registers, the OS would then enable the SSE instructions. Nowadays all OSes have SSE support, but I suspect this code is checking that:

  • the CPU supports SSE2
  • the OS has enabled SSE2

See here for a bit more info: http://wiki.osdev.org/SSE#Checking_for_SSE

Buddy
  • 10,874
  • 5
  • 41
  • 58
  • Thanks Buddy. A quick question: the OSDev page lists what an OS should do. Are there a standard set of tests for userland to test OS support? Or is it sufficient to perform a `por xmm, xmm`? – jww Aug 20 '15 at 22:00
  • I think that's fine.... it'll throw a "Invalid Opcode" error if SSE isn't enabled, so I would think any SSE instruction would be a sufficient test. – Buddy Aug 20 '15 at 22:11
  • One last question (sorry about being a [help vampire](https://meta.stackexchange.com/questions/19665/the-help-vampire-problem)). You referenced a page that discusses SSE, and then you said you suspect the code is testing for CPU and OS support for SSE2. Based on the code in `TrySSE2`, which is it? Is `TrySSE2` a SSE test, or is `TrySSE2` a SSE2 test? (Sorry to split hairs, the person who wrote the code is usually not available to answer questions). – jww Aug 20 '15 at 23:40
  • TrySSE2 is a SSE2 test as [_mm_cvtsi128_si32](https://msdn.microsoft.com/en-us/library/5z7a9642(v=vs.90).aspx) is a SSE2 intrinsic. – Buddy Aug 21 '15 at 03:50
  • `TrySSE2` is definitely an SSE2 specific test since it should produce the instruction `MOVDQA` which is SSE2 specific and will fault if CR4 and CR0 are not set by the OS. Unfortunately Intel didn't really create a way to determine in protected mode for ring 1 to 3 privilege levels a way to test CR4 and CR0 registers, thus people resort to SIGILL or other exception testing to determine if the OS and CPU support a specific instruction. – Michael Petch Aug 23 '15 at 19:17
  • A similar problem exists with SSE (but not MMX). You should check bit 25 and issue an SSE intrinsic like `_mm_setzero_ps` or equivalent instruction `xorps xmm0, xmm0`. MMX didn't have these issues because it had used the FPU87 to do its processing. – Michael Petch Aug 23 '15 at 19:20
  • Intel eventually became wiser with the new VEX based instructions like AVX/AVX2/XOP/FMA3/FMA4/AVX512. You can check at ring 3 (unprivileged userland code) whether the OS and CPU support the functionality needed to support these instruction sets without resorting to methods like trapping SIGILL. – Michael Petch Aug 23 '15 at 19:23
  • Oops, the comments above were directed at @jww – Michael Petch Aug 23 '15 at 19:28