1

I need to write std::hash implementation for my own type, and this implementation has to be fast. Please consider code example:

#include <unordered_set>
#include <cstdint>

struct Vector
{
    float x, y, z;
};

inline bool operator ==( const Vector & a, const Vector & b )
{
    return a.x == b.x && a.y == b.y && a.z == b.z;
}

namespace std
{

template<> 
struct hash<Vector> 
{
    size_t operator()( Vector const& p ) const noexcept
    {
        return 
            ( (size_t)*reinterpret_cast<const std::uint64_t*>(&p.x) ) ^
            ( (size_t)*reinterpret_cast<const std::uint32_t*>(&p.z) << 16 );
    }
};

}

int main()
{
    std::unordered_set<Vector> s;
    s.insert( Vector{ 0, 0, 0 } );
    s.insert( Vector{ 0, 0, 1 } );
    return 0;
}

It works well on practice (when float is 32bit wide), but gcc emits a couple of warnings about it:

main.cpp: In member function 'std::size_t std::hash<Vector>::operator()(const Vector&) const':
main.cpp:23:24: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   23 |             ( (size_t)*reinterpret_cast<const std::uint64_t*>(&p.x) ) ^
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
main.cpp:24:24: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   24 |             ( (size_t)*reinterpret_cast<const std::uint32_t*>(&p.z) << 16 );
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Is it safe to just suppress the warnings? Or there is a better way to write similar code without sacrificing the performance?

Fedor
  • 17,146
  • 13
  • 40
  • 131
  • 2
    _Is it safe to just suppress the warnings?_ violation of strict aliasing is undefined behavior, I'm pretty sure. Consider using `memcpy` as a substitute for type punning instead, the optimizer may even be able to tell what you're trying to do and optimize it into a single `mov`. You could also add `-fno-strict-aliasing` to your compiler arguments. Here's a nice page on strict alaising: [What is the strict aliasing rule?](https://stackoverflow.com/q/98650) it's for C but I _think_ the main idea is still the same. – mediocrevegetable1 May 03 '21 at 12:47
  • 1
    [Tangent] `return a.x == b.x && a.y == b.y && a.z == b.z;` can be rewritten as `return std::tie(a.x, a.y, a.z) == std::tie(b.x, b.y, b.z);` – NathanOliver May 03 '21 at 12:48
  • The `float` type is typically a 32-bit type. Which means `*reinterpret_cast(&p.x)` will go out of bounds of the memory for the `p.x` variable. Other problems aside. – Some programmer dude May 03 '21 at 12:50
  • 2
    The portable way of achieving this is with `memcpy`. Make a `std::uint64_t` and then `memcpy` the two floats to their respective positions. Make sure to use `static_assert` so the compiler will warn you if `float` isn't 32 bit on the platform. There is no way to use `reinterpret_cast` correctly for this purpose. – François Andrieux May 03 '21 at 13:03

0 Answers0