1

So we want to use Intel C compiler with runtime CPU dispatching enabled (this is on Windows platform). We use the options /arch:IA32 plus /QaxSSE2, but no /QxFoo option. This should - to our understanding - produce a binary that runs on any IA32 (x86) processor, but still uses SSE2-optimized code path on processors that actually support SSE2 instruction set.

However, testing reveals that on the processor without SSE2 support (e.g. Pentium III) the binary will crash with "illegal instruction" exception! Interestingly, removing only /QaxSSE2, and leaving anything else as-is, produces a binary that works perfectly fine on the processor without SSE2 support.

Another interesting observation is: Using /arch:IA32 plus /QaxSSE2 together with /Ob0 (disables inlining!) produces a binary that also works perfectly fine on the processor without SSE2 support.

At this point it would seem that either runtime CPU dispatching raises the CPU requirement of the "base" code path to SSE2, regardless of /arch:IA32 option. Or that function inlining and runtime CPU dispatching don't go together. But we fail to find any mention of this in the Intel documentation. This is very important information, so we think this would need to be mentioned in the documentation!

Can anybody confirm the observation or clarify what's going on?

Thank you!

  • What you're describing *should* work just fine. Can you post sample code that reproduces the problem? – Cody Gray - on strike Nov 27 '16 at 12:31
  • Looks like a bug report to me, otherwise without any way to verify or proposal an alternative. Even with code, who still has a functional Pentium 3 laying around? Really rather best to tell Intel about it. – Hans Passant Nov 27 '16 at 12:42
  • Unfortunately I can't. It's a rather "big" project and cannot be made public. However, a small "hello world" program did **not** reproduce the problem. Guess the program needs a certain minimum "complexity" for _runtime CPU dispatching_ to actually have an effect. – Niklas Förster Nov 27 '16 at 12:44
  • 1
    "Even with code, who still has a functional Pentium 3 laying around?" - software developers have to, because some customers (more than you'd expect) definitely _do_ as well ;-) – Niklas Förster Nov 27 '16 at 12:46
  • 1
    So start with your big proprietary project, and start ripping code out until you get a minimal sample that reproduces the problem. It's tedious work, but that's how debugging goes. You should be able to get it down to something reasonably small, and certainly you can remove all sensitive information. I have several functional Pentium 3s laying around, as well as Pentium 4s and machines much older than that. Then again, I sort of collect these things! – Cody Gray - on strike Nov 27 '16 at 12:56
  • Cody Gray, you are right, of course. But I hoped this "tedious work" can be avoided, because someone can confirm that _runtime CPU dispatching_ is **not** working with non-SSE2 processors, or that it requires _function inlining_ to be turned off. But it seems now this isn't the case. So the "tedious work" will be required, I guess... – Niklas Förster Nov 27 '16 at 13:01
  • I really think Intel compiler dispatching mechanism is broken. At least it doesn't perform as documented. Those issues are only getting worse when you try to be AMD compatible. – Royi Apr 16 '19 at 06:09

0 Answers0