You're basing your Knight's Landing worries on this quote about Knight's Corner:
The Knight's Corner processor is based on an x86-64 foundation, yes, but it in fact has its own floating-point instruction set—no x87, no AVX, no SSE, no MMX... Oh, and then you can throw all that away when Knight's Landing (KNL) comes out.
By "throw all that away", they mean all the worries and incompatibilities. KNL is based on Silvermont and is fully x86-64 compatible (including x87, SSE, and SSE2 for both standard ways of doing FP math). It also supports AVX-512F, AVX-512ER, and a few other AVX-512 extensions, along with AVX and AVX2 and SSE up to SSE4.2. A lot like a Skylake-server CPU, except a different set of AVX-512 extensions.
The point of this is exactly to solve the problem you're worried about: so any legacy binary can run on KNL. To get good performance out of it, you want to be running code vectorized with AVX-512 vectors in the loops that do the heavy lifting, but all the surrounding code and other programs in the rest of the Linux distro or whatever can be running ordinary bog-standard code that uses whatever x87 and/or SSE.
Knight's Corner (first-gen commercial Xeon Phi) has its own variant / precursor of AVX-512 in a core based on P5-Pentium, and no other FP hardware.
Knight's Landing (second-gen commercial Xeon Phi) is based on Silvermont, with AVX-512, and is the first that can act as a "host" processor (bootable) instead of just a coprocessor.
This "host" mode is another reason for including enough hardware to decode and execute x87 and SSE: if you're running a whole system on KNL, you're much more likely to want to execute some legacy binaries for non-perf-sensitive tasks, not only binaries compiled specifically for it.
Its x87 performance is not great, though: like one scalar fmul
per 2 clocks (https://agner.org/optimize). vs. 2-per-clock SSE mulsd
(0.5c recip throughput). Same 0.5c throughput for other SSE/AVX math, including AVX-512 vfma132ps zmm
to do 16x single-precision Fused-Multiply-Add operations in one instruction.
So hopefully Go's compiler doesn't use x87 much. The normal way to do scalar math in 64-bit mode (that C compilers and their math libraries use) is SSE, in XMM registers. x86-64 C compilers only use x87 for types like long double
.