5

When comparing assembly generated in MSVC, clang and GCC. The MSVC assembly appears to be far worse than the Clang code.


Question

Is there a flag that is nessary in GCC and MSVC to produce equivilent assembly or is Clang just better in this specific case. I have tried various MSVC flags (Different /O flags), but no substantial change occurs.

Or is there a variation to my code which allows compilers to achieve the better optimisations. I've tried varying the code without loosing the fundamental structure also no change.


Code

The code I'm compiling is only 26 lines so here it is:

#include <cstdint>
#include <type_traits>

template <typename A, typename B>
struct BitCast
{
    static_assert(std::is_pod<A>(), "BitCast<A, B> : A must be plain old data type.");
    static_assert(std::is_pod<B>(), "BitCast<A, B> : B must be plain old data type.");
    static_assert(sizeof(A) == sizeof(B), "BitCast<A, B> : A and B must be the same size.");
    static_assert(alignof(A) == alignof(B), "BitCast<A, B> : A and B must have the same alignment.");
    //
    union
    {
        A a;
        B b;
    };
    //
    constexpr BitCast(A const & value) noexcept : a{ value } {}
    constexpr BitCast(B const & value) noexcept : b{ value } {}
    //
    operator B const & () const noexcept { return b; }
};

float XOR(float a, float b) noexcept
{
    return BitCast<uint32_t, float>{ BitCast<float, uint32_t>{a} ^ BitCast<float, uint32_t>{b} };
}

I've been working in godbolt to identify the cause of the difference https://godbolt.org/z/-VXqOT


Clang 9.0.0 with "-std=c++1z -O3" produced a beautiful:

XOR(float, float):
        xorps   xmm0, xmm1
        ret

Which is basically optimal in my opinion.


GCC 9.2 with "-std=c++1z -O3" produced a slightly worse:

XOR(float, float):
        movd    eax, xmm1
        movd    edx, xmm0
        xor     edx, eax
        movd    xmm0, edx
        ret

Then MSVC with "/std:c++latest /O2" produced a much worse:

float XOR(float,float)
        movss   DWORD PTR $T1[rsp], xmm1
        mov     eax, DWORD PTR $T1[rsp]
        movss   DWORD PTR $T3[rsp], xmm0
        xor     eax, DWORD PTR $T3[rsp]
        mov     DWORD PTR $T2[rsp], eax
        movss   xmm0, DWORD PTR $T2[rsp]
        ret     0
David Ledger
  • 2,033
  • 1
  • 12
  • 27
  • If this is a project, try doing a "release" build (+ assembly output) to see what it produces. – rcgldr Sep 20 '19 at 06:43
  • Unrelated: Why `c++1z` instead of `c++17` or `c++2a`? – Ted Lyngmo Sep 20 '19 at 06:43
  • 1
    First of all this "bit cast" produces Undefined Behavior because you are accessing inactive member of the union. Related (most of the answers suffer from UB as well): https://stackoverflow.com/questions/1723575/how-to-perform-a-bitwise-operation-on-floating-point-numbers – user7860670 Sep 20 '19 at 07:43
  • 2
    I tried [various increasingly desperate hacks](https://gcc.godbolt.org/z/9iMEwM) but got nowhere. – harold Sep 20 '19 at 08:50
  • 1
    @VTT: union type-punning is well-defined in GNU C++ (like in ISO C99). I'm pretty sure MSVC++ also guarantees its safe, the same way it guarantees that pointer-casting is well-defined (unlike GNU C/C++). But anyway, I don't expect MSVC would do any better with memcpy for type-punning; its optimizer isn't very good compared to gcc or clang. https://www.agner.org/optimize/blog/read.php?i=1015 – Peter Cordes Sep 20 '19 at 09:50
  • @PeterCordes But OP is using ISO C++ mode. I don't think that VC++ guarantees anything, it just tries to match expectations. There is a lot of broken code using things like overlay unions, string comparison by pointer or volatile variables synchronization and it must keep it working even though recently they seem to have tendency to be more standard conformant. – user7860670 Sep 20 '19 at 10:23
  • @harold, yeah I like the memcpy approach, reminds me of the cpp weekly this week :) Shame it didn't work. – David Ledger Sep 20 '19 at 14:02
  • @harold I'm quite surprised the intrinsic didn't work. – David Ledger Sep 20 '19 at 14:20
  • @VTT Yeah your right about ISO. Its a shame because type punning like this is a great way to handle bitfields inside registers for embedded systems. – David Ledger Sep 20 '19 at 15:31
  • Indeed, same result even with `std::bit_cast`: https://gcc.godbolt.org/z/P6q7Tr4h6 – Fedor Jan 16 '22 at 18:27

0 Answers0