I want to initialize a variable of type __m128
... where x is of type uint64_t
The intrinsic which takes the uint64_t
is _mm_set_epi64x
(as opposed to _mm_set_epi64
, which takes a __m64
).
I recently ran into the issue on Solaris. Sun Studio 12.3 and below lacks _mm_set_epi64x
. It also lacks the work-arounds, like _mm_cvtsi64_si128
and _m_from_int64
.
Here's the hack I used, if interested. The other option was to disable SSE2, which was not too appealing (and it was 3x slower in benchmarks):
// Sun Studio 12.3 and earlier lack SSE2's _mm_set_epi64 and _mm_set_epi64x.
#if defined(__SUNPRO_CC) && (__SUNPRO_CC < 0x5130)
inline __m128i _mm_set_epi64x(const uint64_t a, const uint64_t b)
{
union INT_128_64 {
__m128i v128;
uint64_t v64[2];
};
INT_128_64 v;
v.v64[0] = b; v.v64[1] = a;
return v.v128;
}
#endif
I believe C++11 could do additional things to help the compiler and performance, like initialize a constant array:
const INT_128_64 v = {a,b};
return v.v128;
There's a big caveat... I believe there is undefined behavior because a write occurs using the v64
member of the union, and then read occurs using the v128
member of the union. Testing under SunCC shows the compiler is doing the expected (but technically incorrect) thing.
I believe you can sidestep the undefined behavior using a memcpy
, but that could crush performance. Also see Peter Cordes' answer and discussion at How to swap two __m128i variables in C++03 given its an opaque type and an array?.
The following may also be a good choice to avoid the undefined behavior from using the inactive union member. But I'm not sure about the punning.
INT_128_64 v;
v.v64[0] = b; v.v64[1] = a;
return *(reinterpret_cast<__m128i*>(v.v64));
EDIT (three months later): Solaris and SunCC did not like the punning. It produced bad code for us, and we had to memcpy
the value into __m128i
. Unix, Linux, Windows, GCC, Clang, ICC, MSC were all OK. Only SunCC gave us trouble.