3

I understand that all x86 (i386) binaries compiled today, with the latest compilers (GCC, Intel CC, VC, etc) will run on an Intel 386 machine (allowing for external dependencies, like OS functions, etc) which explains why there are videos on YouTube of people getting Windows 7 to install and run on a 20 year-old computer.

But I don't understand how ISA extensions work, such as MMX and SSE. If a program's code has an extension instruction but the instruction is not supported by the processor then surely the program must crash (presumably a processor interrupt into the OS). The only solution I can think of is that the program's binary checks to see if an instruction is supported, and if not then it executes lowest-common-denominator code that is guaranteed to be supported by every platform, but surely that can make binaries very large, and what about instructions that were extensions but are now practically standard, like floating-point operations? Does that mean that every i386 binary that does FP operations also include software-FP emulation?

If a program does instruction-supported checks, then where does it do them? If you've got a tight loop then you don't want it performing the check within the loop. I don't believe compilers are smart enough yet to know how to optimise for that in every situation, so the lack of visible compiler options is weird.

I can't find any settings or options in my VC project settings for the compiler or linker for controlling instruction emission. There are a few about optimization but nothing that gives the control I expect.

And what about AMD's "3DNow!" extension? 3DNow is meant to be an extension on top of MMX, so it has some unique instructions and aspects of its own, but I can't find any references to it in my VC compiler or linker settings, so if I'm writing code how can I get my compiler to use the 3DNow! instructions (rather than MMX)?

Also, the Wikipedia article on 3DNow states that AMD removed a few 3DNow instructions from their processors - if there exists code that assumes these instructions are present will it no-longer work?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Dai
  • 141,631
  • 28
  • 261
  • 374
  • 1
    Actually, some Linux distros seem to target i586 (Pentium) or i686 (Pentium Pro) now. There's a mention of that in https://www.phoronix.com/scan.php?page=news_item&px=Fedora-31-Kill-i686-Kernels eg. – ecm Aug 03 '19 at 17:10
  • Does this answer your question? [New instruction sets in CPU](https://stackoverflow.com/questions/2530103/new-instruction-sets-in-cpu) – Johan Oct 13 '21 at 09:28

2 Answers2

3

I understand that all x86 (i386) binaries compiled today, with the latest compilers (GCC, Intel CC, VC, etc) will run on an Intel 386 machine (allowing for external dependencies, like OS functions, etc) which explains why there are videos on YouTube of people getting Windows 7 to install and run on a 20 year-old computer.

Not really. Intel 386 did not have cmpxchg8b that is required for Windows Vista or Windows 7. After a certain update, Windows 7 requires SSE2.

But I don't understand how ISA extensions work, such as MMX and SSE. If a program's code has an extension instruction but the instruction is not supported by the processor then surely the program must crash (presumably a processor interrupt into the OS). The only solution I can think of is that the program's binary checks to see if an instruction is supported, and if not then it executes lowest-common-denominator code that is guaranteed to be supported by every platform, but surely that can make binaries very large, and what about instructions that were extensions but are now practically standard, like floating-point operations?

Runtime feature dispatch is a thing, and it is used in some programs and libraries, but often the code is compiled to require certain instructions.

Even if there is runtime feature dispatch, it is often used to detect relatively new extensions, while old extensions are mandatory.

/arch:SSE2 is default in Visual Studio 2012 and later, so normally programs will not use legacy floating point operations.

An example of runtime dispatch is vector_algorithms.cpp from MSVC STL where AVX2 and SSE2 are checked for availability at runtime. In x86 mode they fall back to plain C++ code which is complied not to emit vectored instructions, in x64 mode the use of SSE2 is unconditional, as SSE2 is the baseline for x64.

I can't find any settings or options in my VC project settings for the compiler or linker for controlling instruction emission. There are a few about optimization but nothing that gives the control I expect.

/arch switch. From IDE, "Enable Enhanced Instruction Set" in C/C++ compiler options.

And what about AMD's "3DNow!" extension? 3DNow is meant to be an extension on top of MMX, so it has some unique instructions and aspects of its own, but I can't find any references to it in my VC compiler or linker settings, so if I'm writing code how can I get my compiler to use the 3DNow! instructions (rather than MMX)?

Visual Studio is not able to use them during auto-vectorizing (never auto-vectorized using them). As for intrinsic form, surprisingly you can still use them in x86 mode (but not x64 mode).

Also, the Wikipedia article on 3DNow states that AMD removed a few 3DNow instructions from their processors - if there exists code that assumes these instructions are present will it no-longer work?

To clarify, instructions are removed in a sense that new processors don't have them. They are not removed by a microcode update from old processor.

Instructions removed to complete non-existence, attempts to use them may raise #UD exceptions, or their encoding may be re-purposed for other instructions.

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
1

From my understanding, these extensions were not used that often when they were new. Nowadays these extensions are a given (when using AMD64) or assumed (when using IA-32). Back in the days, only when these extensions were really needed it, like when you are developing a math library, you would actually benefit from these features. In that case you were doing very specific work and could spend the time on deciding how to handle backwards compatibility. You basically had two options, either you provide backwards compatibility by a software implementation, or you don't and you don't support the older platforms or do no support specific functionality on those platforms. According to Wikipedia there is a third option:

In other cases, an operating system may mimic the new features for older processors, though often with reduced performance.

Which seems to mean that somehow the OS will catch the non-supported instructions and replace them by a software implementation. The reference link from Wikipedia was also dead and not archived, but I found another clue:

IA-32 CPU will generate an exception, namely #UD (undefined) exception, when encountering unknown instruction during running a binary program. This exception boils down to SIGILL signal in UNIX-like operating systems. The idea of emulating an instruction is, to hook this exception, do the right thing in the exception handler, and return to the main program. So it is essential that CPU generates exception on every instruction we want to emulate.

However, believing the comments on this answer, these OS implementations are not a thing anymore.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Johan
  • 342
  • 3
  • 14
  • 1
    Current mainstream OSes don't emulate unsupported instructions in their #UD (undefined instruction) exception handlers. That used to be a thing for software floating-point on CPUs without hardware x87 support, but it's too slow to be worth it for instructions that only get used for performance reasons in the first place. OSes just deliver e.g. a SIGLL. (Plus, there's now a mechanism, CPUID, for software to check what extensions are available and set function pointers accordingly or whatever. e.g. that's how glibc picks AVX2 or AVX-512VL strcmp / memcmp / etc. on CPUs that support it.) – Peter Cordes Oct 13 '21 at 09:24
  • 1
    You're mostly right about not using extensions except when specifically worth it for performance. A lot of code is just compiled for baseline x86-64, which includes SSE2 which was an extension for 32-bit mode, but is baseline for 64-bit. So SIMD is available without CPUID checks on 64-bit code. (And these days it's normal to build 32-bit programs to assume an SSE2 baseline to use without checking, and that still counts as an extension to i386, even though you can also look at it as just the baseline moving forwards.) – Peter Cordes Oct 13 '21 at 09:27
  • If you want software emulation of SIMD extensions your CPU doesn't have, you run your program under SDE or any similar emulation layer. SDE does binary translation, so supported instructions are fast. [How to test AVX-512 instructions w/o supported hardware?](https://stackoverflow.com/q/51805127) – Peter Cordes Oct 13 '21 at 09:30
  • 1
    Apparently Windows uses/used emulation in an exception handler to fix alignment failures on certain CPUs -- see `SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT)`, but not to emulate missing instructions. – Alex Guteniev Oct 13 '21 at 11:30