1

I have the following code which I would like to input in an unordered_set for the fastest access possible based on its 3D coordinates.

struct MeshVertex {
  float x;
  float y;
  float z;

  std::vector<MeshVertex*> links;

  float& operator [](int i) {
    switch (i) {
    case 0:
      return x;
    case 1:
      return y;
    case 2:
      return z;
    default:
      throw std::out_of_range("");
    }
  }

  MeshVertex& operator=(const STLFileVertex& other)
  {
    x = other.x;
    y = other.y;
    z = other.z;
    return *this;
  }

  bool operator<(const MeshVertex& other) const
  {
    if (x == other.x) {
      if (y == other.y)
        return z < other.z;
      return y < other.y;
    }
    return x < other.x;
  }

  bool operator==(const MeshVertex& other) const {
    if (x == other.x && y == other.y && z == other.z)
      return true;
    return false;
  }

  bool operator!=(const MeshVertex& other) const {
    if (x == other.x && y == other.y && z == other.z)
      return false;
    return true;
  }

  double distance(const MeshVertex& other) const {
    return hypot(hypot(x - other.x, y - other.y), z - other.z);
  }
};

As expected, I get the following error:

The C++ Standard doesn't provide a hash for this type.

How do I implement a well performing hash for this? It only has to contain the members x, y and z and the combination of hash and comparison must be 100% free of collisions, meaning a vertex can only be replaced if the newly inserted one has exactly the same values. Please consider that the vertexes are represented by float type. Which means a normal comparison could be misleading.

Edit:

What I'm really doing is reading an STL (stereolithography) binary file. Those familiar with the STL format will know that triangles are written like this in the file:

float normals[3];
float vertex[3][3];
uint16_t attributes;

That means vertexes will be duplicated more than often when reading the file. I want to normalize vertexes (removing duplicates) and apply links to recreate the triangles.

My goal is mapping all the vertexes in a graph, through Breadth-first search. The links, as previously mentioned, are available in the triangles. After getting all the links, the triangles become disposable.

If I don't have a perfect mean to remove duplicated vertexes, links will become broken and my mapping will fail.

Alexandre Severino
  • 1,563
  • 1
  • 16
  • 38
  • 1
    100% free of collisions is not necessary nor, in your case, easily achievable for a sane hash value size. – pvg Oct 17 '17 at 19:31
  • 1
    related/maybe dupe?: https://stackoverflow.com/questions/17016175/c-unordered-map-using-a-custom-class-type-as-the-key – NathanOliver Oct 17 '17 at 19:35
  • Very similar, but not quite the same. They are talking about an unordered_map and I'm talking about an unordered_set. Besides, the answer is not very convincing to be as collision free as I need. (If not possible to avoid collisions, I'd expect that to be explained in the answer. – Alexandre Severino Oct 17 '17 at 19:37
  • @pvg are you sure I don't need 100% free of collisions? I might have up to hundreds of thousands of vertexes in this unordered_set. If one vertex collide with another one, that means my entire mesh has become invalid. – Alexandre Severino Oct 17 '17 at 19:39
  • @AlexandreSeverino the container deals with the collisions. – pvg Oct 17 '17 at 19:44
  • An unordered_set uses **two** functions - a hash function and an equality function. The hash function is **not** used to detect duplicate entries, the equality function is. It matters way less than you think to have a collision-less hash function. – rustyx Oct 17 '17 at 19:44
  • Those comments kind of answer my question. I can, therefore, use a hash of only part of my coordinates. Is that correct? – Alexandre Severino Oct 17 '17 at 19:47
  • @AlexandreSeverino Lets says your hashing function creates 10 collisions. Then once the object is hashed and the bucket is found you only have to check the equality on those up to 10 elements in the bucket. That's not so bad. – NathanOliver Oct 17 '17 at 19:47
  • My bad. I didn't know unordered_set went through two verifications. I don't think that this is justification for a down vote. – Alexandre Severino Oct 17 '17 at 19:48
  • As an aside .. [comparing floats sure might look equal](https://randomascii.wordpress.com/2012/02/11/they-sure-look-equal/) – txtechhelp Oct 17 '17 at 19:48
  • @txtechhelp does that apply for my == operator, or is that strictly related to outputs such as the one demonstrated in that post? – Alexandre Severino Oct 17 '17 at 19:50
  • [It's for any comparison of floats/doubles.](https://stackoverflow.com/questions/17333/what-is-the-most-effective-way-for-float-and-double-comparison) .. but it DOES depend on your situation .. not enough context in your question to determine that .. but I'd say you'd probably want to check some epsilon if you're trying to avoid collisions – txtechhelp Oct 17 '17 at 19:52
  • I guess it's not entirely clear _why_ you want to put these in a set? Is it for de-duplication? In that case the comparison issue might matter. – pvg Oct 17 '17 at 20:01
  • I don't really know anything about STL files, in your case, the hash function doesn't matter that much (other than for performance reasons), you can find lots of examples of how to generate a decent hash out of 3 32 bit values (including plenty on [SO]). But you better hope the vertices you plan to deduplicate are bit-for-bit identical - if not, you won't remove things you might consider dupes. But even if you just wrote a hash function that returns 5 for everything, you won't end up removing things that are not actually equal. – pvg Oct 17 '17 at 20:20
  • I have tested something before, and they are, indeed, bit-for-bit identical. What was happening was the opposite: My program was accusing very close vertexes - but not actually the same pos - to be duplicated. Maybe it was a problem in my code? Maybe I should find out if simply comparing floats could be my culprit? – Alexandre Severino Oct 17 '17 at 20:24
  • 1
    It could be. Of course, one way to make this trivial and if all you care about is the bit-ness is to hang on to the raw values you get out of the file (i.e. don't convert them to floats). Just compare them as unsigned ints, for the purposes of deduplication. The hash function would be identical, if you still want to use the set. – pvg Oct 17 '17 at 20:29
  • Using them as 32 bits integers. What a simple yet amazing idea! – Alexandre Severino Oct 17 '17 at 20:45

1 Answers1

1

There are 5 easy steps to end all your std::hash woes.

TL;DR - dump std::hash at the earliest opportunity and use boost. Like much of the standard library inflicted on us since c++11, it's a lacklustre shadow of the boost library that preceded (and outperforms) it.

#include <boost/functional/hash.hpp>
#include <unordered_map>
#include <tuple>
#include <type_traits>

struct MeshVertex {
  float x;
  float y;
  float z;

  // step one - make your life easier and provide an as_tuple
  // method
  auto as_tuple() const {
      return std::tie(x, y, z);
  }
};

// step 2 all operators in terms of tuple
bool operator == (MeshVertex const& l, MeshVertex const& r)
{
    return l.as_tuple() == r.as_tuple();
}

// step 2 all operators in terms of tuple
bool operator < (MeshVertex const& l, MeshVertex const& r)
{
    return l.as_tuple() < r.as_tuple();
}

// step 3 - use the boost::hash protocol of providing a free function
// called hash_value in the namespace of MeshVertex. std::hash sucks.

std::size_t hash_value(MeshVertex const& arg)
{
    std::size_t seed = 0;
    boost::hash_combine(seed, arg.as_tuple());
    return seed;
}

// step 4 - define your own hash algo to escape from the
// std::hash fiasco.

struct universal_hash
{
    template<class T> std::size_t operator()(T&& arg) const
    {
        // note - boost::hash. A thousand times better.

        using hasher_type = boost::hash<std::decay_t<T>>;
        auto hasher = hasher_type();
        return hasher(arg);
    }
};


int main()
{
    // step 5 - never under any circumstances allow your containers
    // to inflict std::hash on you 

    std::unordered_map<int, MeshVertex, universal_hash> mymap;

    mymap[1] = { 1, 3, 4 };
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • I'm taking this as an accepted answer because this code looks beautiful, (and actually does answer my question), but what actually solved my collision problem was, believe it or not, hashing the coordinates only after having them casted to uint32! – Alexandre Severino Oct 18 '17 at 01:20