Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators MIC Intel Xeon Phi?
-
2Compatible with which version of Xeon Phi? Earlier ones had issues, future ones are claimed to support even AVX-512 – Leeor Mar 26 '14 at 18:57
-
2I don't believe the current Xeon Phi is able to run SSE/AVX(2). Even its AVX-512 is a little bit different from the one coming in Skylake and future Xeon Phi chips. – Mysticial Mar 26 '14 at 20:36
-
@Mysticial I.e. the current Xeon Phi is able to run only AVX-512, not SSE/AVX2, isn't it? – Alex Mar 26 '14 at 21:23
-
@Alex I believe that's correct. – Mysticial Mar 26 '14 at 21:25
1 Answers
Yes, current generation of Intel Xeon Phi co-processors (codename "Knight's Corner", abbreviated KNC) supports 512-bit SIMD instruction set called "Intel® Initial Many Core Instructions" (abbreviated Intel® IMCI).
Intel IMCI is not "compatible with" and is not equialent to SSE, AVX, AVX2 or AVX-512 ISA. However it's officially announced that next planned generations of Xeon Phi (codename "Knight's Landing", abbreviated KNL) will support AVX-512 ISA.
Both Intel IMCI (supported by KNC) and AVX-512 (to be supported by KNL) are 512-bit SIMD instruction sets, supporting FMA and allowing to pack 8 double precision or 16 single precision floating-point numbers, or 16 32-bit integers (i.e. two times "more" than AVX or AVX2).
While KNC is unable to "run" SSE or AVX binaries, it often doesn't matter, because in order to generate your application binary to be able to run on KNC - you need to recompile your code using Intel C/C++/Fortran Compiler, which is known to automatically or semi-automatically generate relatively efficient vector codes (for SSE, AVX, IMCI, etc) and also gives you capability to use IMCI intrinsics if needed.
Side note: for Knights Landing (with AVX-512 support) Intel toolchain will not be a sole option anymore, but it will likely continue to provide many advantages, including solid explicit- and auto- vectorier as well as good level of integration with Intel profiling tools (note for example AVX-512 analysis in Intel (Vectorization) Advisor) .
AVX-512 ISA is compatible with SSE, AVX and AVX2. Therefore applications compiled for AVX on Xeon will run on KNL, while applications compiled for AVX-512 on Xeon Phi KNL will normally run on future generations of Xeon (to support AVX-512 in future).
The difference between AVX, IMCI and future AVX-512 instruction sets could easily be explored using following online guide: http://software.intel.com/sites/landingpage/IntrinsicsGuide/

- 1,664
- 9
- 16
-
Thanks! "because in order to generate your application binary to be able to run on KNC - you need to recompile your code using Intel C/C++/Fortran Compiler, which is known to automatically or semi-automatically generate relatively efficient vector codes (for SSE, AVX, IMCI, etc)" - yes, but if I don't use built-in or inline assembly SSE/AVX[1/2]-instructions, and if I use only automatic vectorization. – Alex Mar 28 '14 at 20:55
-
1Yes, if you only use inline assembly, then you have to update your assembly implementation every time wider/newer ISA appears and your code is not always portable. This is one reason, why there are so many higher level and more portable SIMD abstractions available for x86 CPUs as well as for all Xeon Phi-s. They are: 1) intrinsics , 2) vec classes and various SIMD libraries, 3) "Explicit vectorization" means in OpenMP4.0 and Cilk Plus standards, 4) Compiler auto-vectorization – zam Mar 29 '14 at 17:52
-
Yes, and may be Intel TBB uses both SIMD and MultiThreading, isn't it? I like that SIMD present in OpenMP4 because it is both cross-platform and cross-hardware. – Alex Mar 29 '14 at 19:07
-
2I don't know about any Intel TBB library plans to introduce SIMD programming support and I'm not sure about their internal implementation; but I know that you can combine TBB threading with any other SIMD framework/pragmas like OMP4 pragma simd, Cilk or compiler specifig pragmas. BTW I told "SIMD framework", i.e. pragma simd, not pragma parallel for (combining different parallel-fors also works well pretty often, but requires you to additionally check compatibility claims for pair of specific runtimes) – zam Apr 01 '14 at 17:55