A colleague showed me code that I thought wouldn't be necessary, but sure enough, it was. I would expect most compilers would see all three of these attempts at equality tests as equivalent:
#include <cstdint>
#include <cstring>
struct Point {
std::int32_t x, y;
};
[[nodiscard]]
bool naiveEqual(const Point &a, const Point &b) {
return a.x == b.x && a.y == b.y;
}
[[nodiscard]]
bool optimizedEqual(const Point &a, const Point &b) {
// Why can't the compiler produce the same assembly in naiveEqual as it does here?
std::uint64_t ai, bi;
static_assert(sizeof(Point) == sizeof(ai));
std::memcpy(&ai, &a, sizeof(Point));
std::memcpy(&bi, &b, sizeof(Point));
return ai == bi;
}
[[nodiscard]]
bool optimizedEqual2(const Point &a, const Point &b) {
return std::memcmp(&a, &b, sizeof(a)) == 0;
}
[[nodiscard]]
bool naiveEqual1(const Point &a, const Point &b) {
// Let's try avoiding any jumps by using bitwise and:
return (a.x == b.x) & (a.y == b.y);
}
But to my surprise, only the ones with memcpy
or memcmp
get turned into a single 64-bit compare by GCC. Why? (https://godbolt.org/z/aP1ocs)
Isn't it obvious to the optimizer that if I check equality on contiguous pairs of four bytes that that's the same as comparing on all eight bytes?
An attempt to avoid separately booleanizing the two parts compiles somewhat more efficiently (one fewer instruction and no false dependency on EDX), but still two separate 32-bit operations.
bool bithackEqual(const Point &a, const Point &b) {
// a^b == 0 only if they're equal
return ((a.x ^ b.x) | (a.y ^ b.y)) == 0;
}
GCC and Clang both have the same missed optimizations when passing the structs by value (so a
is in RDI and b
is in RSI because that's how x86-64 System V's calling convention packs structs into registers): https://godbolt.org/z/v88a6s. The memcpy / memcmp versions both compile to cmp rdi, rsi
/ sete al
, but the others do separate 32-bit operations.
struct alignas(uint64_t) Point
surprisingly still helps in the by-value case where arguments are in registers, optimizing both naiveEqual versions for GCC, but not the bithack XOR/OR. (https://godbolt.org/z/ofGa1f). Does this give us any hints about GCC's internals? Clang isn't helped by alignment.