When comparing assembly generated in MSVC, clang and GCC. The MSVC assembly appears to be far worse than the Clang code.
Question
Is there a flag that is nessary in GCC and MSVC to produce equivilent assembly or is Clang just better in this specific case. I have tried various MSVC flags (Different /O flags), but no substantial change occurs.
Or is there a variation to my code which allows compilers to achieve the better optimisations. I've tried varying the code without loosing the fundamental structure also no change.
Code
The code I'm compiling is only 26 lines so here it is:
#include <cstdint>
#include <type_traits>
template <typename A, typename B>
struct BitCast
{
static_assert(std::is_pod<A>(), "BitCast<A, B> : A must be plain old data type.");
static_assert(std::is_pod<B>(), "BitCast<A, B> : B must be plain old data type.");
static_assert(sizeof(A) == sizeof(B), "BitCast<A, B> : A and B must be the same size.");
static_assert(alignof(A) == alignof(B), "BitCast<A, B> : A and B must have the same alignment.");
//
union
{
A a;
B b;
};
//
constexpr BitCast(A const & value) noexcept : a{ value } {}
constexpr BitCast(B const & value) noexcept : b{ value } {}
//
operator B const & () const noexcept { return b; }
};
float XOR(float a, float b) noexcept
{
return BitCast<uint32_t, float>{ BitCast<float, uint32_t>{a} ^ BitCast<float, uint32_t>{b} };
}
I've been working in godbolt to identify the cause of the difference https://godbolt.org/z/-VXqOT
Clang 9.0.0 with "-std=c++1z -O3" produced a beautiful:
XOR(float, float):
xorps xmm0, xmm1
ret
Which is basically optimal in my opinion.
GCC 9.2 with "-std=c++1z -O3" produced a slightly worse:
XOR(float, float):
movd eax, xmm1
movd edx, xmm0
xor edx, eax
movd xmm0, edx
ret
Then MSVC with "/std:c++latest /O2" produced a much worse:
float XOR(float,float)
movss DWORD PTR $T1[rsp], xmm1
mov eax, DWORD PTR $T1[rsp]
movss DWORD PTR $T3[rsp], xmm0
xor eax, DWORD PTR $T3[rsp]
mov DWORD PTR $T2[rsp], eax
movss xmm0, DWORD PTR $T2[rsp]
ret 0