how to use SSE instruction in the x64 architecture in c++?

Question

Currently I am using Visual C++ inline assembly to embed some core function using SSE; however I juts realised that inline assembly is not supported in x64 mode.

How can I use SSE when I build my software in x64 architecture?

[intrinsics](https://msdn.microsoft.com/en-us/library/hh977022.aspx) — Z boson, Apr 15 '15 at 13:49
Also, [this page from Intel](https://software.intel.com/sites/landingpage/IntrinsicsGuide/) looks nice (from [this answer](http://stackoverflow.com/questions/7156908/sse-intrinsic-functions-reference)). — Daniel Dinu, Apr 15 '15 at 13:52
Take a look at [DirectXMath](http://blogs.msdn.com/b/chuckw/archive/2012/03/27/introducing-directxmath.aspx) which comes with VS 2012, VS 2013, and VS 2015. If nothing else, you can look at the all header implementation for ideas on implementing common operations with SSE intrinsics that are portable for x86 and x64. — Chuck Walbourn, May 19 '15 at 06:55
Also note that Windows x64 already uses SSE for all floating-point math instead of legacy x87. It uses 'scalar' SSE (``addss`` et al). x64 requires support for SSE and SSE2. — Chuck Walbourn, May 19 '15 at 06:59

score 9 · Accepted Answer · edited May 23 '17 at 10:29

The modern method to use assembly instructions in C/C++ is to use intrinsics. Intrinsics have several advantages over inline assembly such as:

You don't have to worry about 32-bit and 64-bit mode.
You don't need to worry about registers and register spilling.
No need to worry AT&T and Intel Syntax.
No need to worry about calling conversions.
The compiler can optimize intrinsics further which it won't do with inline assembly.
Intrinsics are compatible (for the most intrinsics) with GCC, MSVC, ICC, and Clang.

I also like intrinsics because it's easy to emulate hardware with them for example to prepare for AVX512.

You can find the list of Intrinsics MSVC supports here. Intel has better information on intrinsics as well which agrees mostly with MSVC's intrinsics.

But sometimes you still need or want inline assembly. In my opinion it's really stupid that Microsoft does not allow inline assembly in 64-bit mode. This means they have to define intrinsics for several things that other compilers can still do with inline assembly. One example is CPUID. Visual Studio has an intrinsic for CPUID but GCC still uses inline assembly. Another example is adc. For a long time MSVC had no intrinsic for adc but now it appears they do.

Additionally, because they have to create intrinsics for everything it causes confusion. They have to create an intrinsic for mulx but the Intel's documentation for this is wrong. They also have to create intrinics for adcx and adox as well but their documentation disagrees with Intel's and the generated assembly shows that no intrinsic produces adox. So once again the programmer is left waiting for an intrinsic for adox. If they had just allowed inline assembly then there would be no problem.

But back to SSE. With few exceptions, e.g. _mm_set_epi64x in 32-bit mode on MSVC (I don't know if that's been fixed) the SSE/AVX/AVX2 intrinsics work as expected with MSVC, GCC, ICC, and Clang.

The key thing to remember is that writing properly formed x64 assembly is a real pain with all the required rules for stack unwinding. Intrinsics are a lot easier, and portable between x86 and x64 if you avoid use of ``__m64``. FWIW, you can't use inline assembly in Visual C++ ARM either for the same reason. (i.e. table-based exception handling and mandatory stack unwinding) — Chuck Walbourn, May 19 '15 at 06:54

how to use SSE instruction in the x64 architecture in c++?

1 Answers1