5

i really dont get it:

i am reading points, each holds 3 float values, out of a binary file. Saving this points in an unordered_map

therefore i try to create a key out of these 3 float values:

first intention: just use the exact bits as key:

unordered_map<string, vector<float>> points;
string vecToKey( float* a ) {
char bytes[12];
memcpy(bytes, a, 12);
return string(bytes);
}

the point is that i definitely want to eleminate same points this way but

in an example project reading about 21374 points the map result size = 10640 points

using following method as key creation results in the proper result of 10687 points

string vec3ToKey( float a[3] ) {
float a1[3];
a1[0] = a[0];
a1[1] = a[1];
a1[2] = a[2];
stringstream ss;
boost::archive::text_oarchive oa(ss);
oa << a1;
return ss.str();
}

the problem is the speed. second method needs about 16 seconds and first method just 1-2 seconds... i just cant explane myself why there even is a difference ...

i appreciate every idea :)

mskfisher
  • 3,291
  • 4
  • 35
  • 48
user1424359
  • 55
  • 1
  • 6
  • 2
    If you care about speed, don't use `std::string`s as key to begin with. – K-ballo May 29 '12 at 18:13
  • The first chunk of code is broken. How can `string(bytes)` know how many bytes the output string should be? The second method takes a long time because it has to convert the floats to text. – David Schwartz May 29 '12 at 18:15
  • @DavidSchwartz The first could work if `bytes` had a length of 13 and was zero initialized, though it would still be a bad idea. – Mooing Duck May 29 '12 at 18:20
  • How would `string(bytes)` know it was of size 13? – David Schwartz May 29 '12 at 18:20
  • 1
    The first example is also broken with certain float numbers which contain zero as a byte in its pattern. – MerickOWA May 29 '12 at 18:34
  • @MerickOWA: Actually, the break is *much* worse for numbers that don't. With numbers that do, it will cause excess collisions. With numbers that don't, lookups may *fail*. – David Schwartz May 29 '12 at 18:37
  • @DavidSchwartz I agree, its bad all around ;) – MerickOWA May 29 '12 at 18:38
  • i definitely abuse the string class but still ich works perfectly and extreme fast with return string(bytes, 12); the problem surely was that not the whole bit order was copied cause there might be a null bight in the floating point bit order ... – user1424359 May 29 '12 at 21:00

5 Answers5

5
string vecToKey( float* a ) {
  char bytes[12];
  memcpy(bytes, a, 12);
  return string(bytes);
}

The string constructor you're using stops at the first null byte. Floating point values can contain null bytes. So the string is probably not accurately representing the three floats. You can see by sticking an assert in there:

  string s(bytes);
  assert(s.size() == sizeof bytes);
  return s;
}

Another problem is that bytes might not contain a null byte, and the program may copy random garbage into the string or otherwise exhibit undefined behavior.

I would recommend that you just not try to abuse string this way. You want a key that's three floats, so use a key that represents exactly that: std::array<float,3>. Or better yet use a 'Point' class since that's what the three floats represent.

Since there's no built in hash function for arrays you can use something like this:

// taken from http://stackoverflow.com/questions/6899392/generic-hash-function-for-all-stl-containers
template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

struct Hasher {
    size_t operator() (std::array<float,3> const &v) {
        size_t h = std::hash<float>()(v[0]);
        hash_combine(h,v[1]);
        hash_combine(h,v[2]);
        return h;
    }
};


std::unordered_map<std::array<float,3>,vector<float>,Hasher> map;
bames53
  • 86,085
  • 15
  • 179
  • 244
1

Change the index to an integer type, say unsigned int. Try code more like this:

unsigned int vec3toKey( float a[3] )
{
   unsigned char *in = reinterpret_cast<unsigned char>(a);
   unsigned int ret = 2654435761u;
   for(int i = 0; i < (3 * sizeof(float)); ++i)
     ret = (ret * 2654435761u) ^ *in++;
   return ret;
}
David Schwartz
  • 179,497
  • 17
  • 214
  • 278
0

A generic container can use any type as the key. Surely you have some kind of point class in this application. That should be your key.

Aparently you'll need to overload the hashing function in order to use your own type (thanks DavidSchwartz). This question addresses that:

C++ unordered_map user defined type

PS: If you don't have a point struct or class, you probably want one :p

Community
  • 1
  • 1
John Humphreys
  • 37,047
  • 37
  • 155
  • 255
0

I was trying to use @bames53 solution but got some compilation issues. While considering this answer and what @bames53 is given here is the final version I have

  #include <iostream>
  #include <string>
  #include <vector>
  #include <unordered_map>


  template <typename C> struct Hasher{
                      typedef typename C::value_type value_type;
                      inline std::size_t operator() (const C &c) const{
                          std::size_t seed = 0;
                          for(typename C::const_iterator it = c.begin(); it != c.end(); ++it){
                              std::hash<value_type> hasher;
                              seed ^= hasher(*it) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
                          }
                          return seed;
                      }
  };

  int main()
  {

              std::unordered_map<std::array<float, 3>, std::vector<float>, Hasher<std::array<float, 3>>> E;
                  std::array<float, 3> t1 = {34, 56,78};
                  std::vector<float> result;
                  result.push_back(45);
                  E[t1] = result;

  }
GPrathap
  • 7,336
  • 7
  • 65
  • 83