0

So, no matter how much I read about SIMD instructions, there is something basic I still can't understand properly and would, therefore, love to have some (conceptual) explanation or suggestions about.

I understand that the many SIMD implementations vary from one CPU architecture to another (MMX, SSE, SSE2, etc). However, considering that since the middle of the 2000s there seems to have been greater convergence between SIMD instructions-sets across Intel and AMD (and Apple has started used Intel), I don't get the following.

Simply put, if an application has a specific SIMD code (e.g. for a vectorized math library), would it equally run in both Intel's and AMD's (therefore in Windows and Linux computers) and also in iOS without any modification?

Or would it be required that specific code is implemented for each CPU architecture/operational system that is target by the application, in a way that different compilations of the application are given for each user type?

user123443563
  • 171
  • 1
  • 2
  • 8
  • For x86 and ARM see http://www.yeppp.info/ for x86 only see http://www.agner.org/optimize/#vectorclass – Z boson Jan 20 '16 at 07:59
  • @Zboson thanks! I already knew yeppp, but it's lacking some basic functionality and I lost it from my radar. Now, I did not know the other one and I really liked what I've read in their docs so far. I wish there was something similar for ARM, then it would be just the case of maintaining two libraries - at least for the basic operations. – user123443563 Jan 20 '16 at 08:19
  • Agner's vector class library (VCL) is awesome. Look at the `dispatch_example.cpp` file. Read through some of the source code (it's clear except for some template meta-programming). Read through the manual. – Z boson Jan 20 '16 at 08:30
  • @Zboson I have to thank you again. I've been playing with VCL in the past hour and it's magic. It is so easy to use and already improves performance for quite a lot. I don't know if I am the only one late in bandwagon, but I'm surprised to not have found references to that library before (btw Agner's blog is also great). Even if I will still have to implement some details manually in multiple instructions, your suggestion already made my programming life easier and happier. – user123443563 Jan 20 '16 at 08:51
  • I have learned a lot from reading Agner's code. I mostly learned SIMD from the VCL. You can answer most of the x86 SIMD questions on SO by readying the VCL source code. I have only found one case where the VCL did not optimize ideally and Agner already fixed that. – Z boson Jan 20 '16 at 08:55

1 Answers1

2

For Intel/AMD there can be some convergence, depending on how hard you want to push the performance envelope. iOS devices are ARM-based though, and use Neon SIMD rather than Intel/AMD's SSE/AVX, so there is no binary compatibility and only minimal compatibility at the source level (e.g. via macros or template libraries). See this question for some cross-platform solutions.

Community
  • 1
  • 1
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Thanks for your reply. I see, so considering that SSE is widely implemented in the x86 world, at least a SSE implementation plus an ARM based would be necessary for cross-platform compatibility in case no "translating" libraries such as libsimdpp are used? – user123443563 Jan 19 '16 at 23:46
  • Yes, that's right - for x86 you may also want different optimised implementations for different generations of CPU family, e.g. Intel's performance libraries (IPP, MKL) do this. – Paul R Jan 20 '16 at 06:15
  • Makes sense, since while SSE is present in pretty much all x86, newer methods vary - so the idea would be to make SSE the baseline and implement others for the cases when users can benefit from them. By the way, if I understood it correctly, Intel's MKL assures compatibility across several SIMD methods (SSE, SSE2, etc), as well as AMD's ACML or SSEPlus. ARM, on the other hand, seems to lack official equivalents that ease the cross-architecture compatibility. Anyway, many thanks for your support. It was really helpful. – user123443563 Jan 20 '16 at 06:50
  • There are other cross-platform libraries, e.g. if you're doing OS X and/or iOS development then Apple's "Accelerate" framework provides many SIMD-optimised routines for DSP, image processing, linear algebra, etc, which work on both OS X Intel platforms and iOS ARM-based devices. Also many compilers can do a reasonable, if somewhat limited job of SIMD-optimising normal scalar code, e.g. gcc, clang, ICC, even MSVC these days, can all do this with a varying degree of success, potentially sparing the programmer from the task of writing SIMD code via intrinsics or assembler. – Paul R Jan 20 '16 at 06:54