10

I want to use a POD struct as a hash key in a map, e.g.

struct A { int x; int y; };
std::unordered_map<A, int> my_map;

but I can't do this, since no hash function is auto-generatable for such structs.

  • Why does the C++ standard not require a default hash for a POD struct?
  • Why do compilers (specifically, GCC 4.x / 5.x) offer such a hash, even if the standard doesn't mandate one?
  • How can I generate a hash function, using a template, in a portable way, for all of my POD structures (I'm willing to make semantic assumptions if necessary)?
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 2
    The compiler doesn't generate one because the standard doesn't specify that it should. If it did, it would be an extension. – juanchopanza Mar 06 '16 at 11:41
  • 10
    There is no universally appropriate algorithm for generating a hash key. If the compiler did generate one, it would just be a guess, with no guarantee that it would work well. It doesn't know anything about the *meaning* of your data or the values' distribution. – Cody Gray - on strike Mar 06 '16 at 11:43
  • @juanchopanza: See edit – einpoklum Mar 06 '16 at 11:47
  • 2
    @CodyGray: Why is my case different than an `std::pair`? Also, what should I do, then? – einpoklum Mar 06 '16 at 11:48
  • Are you asking what you should do beside provide a hash function for `A`? – StoryTeller - Unslander Monica Mar 06 '16 at 11:52
  • 2
    To write a general function for all PODs? No. C++ has no reflection to deal with all the members individually, and uninitialised padding between the members means you can't just treat the whole thing as a `char[]`. – BoBTFish Mar 06 '16 at 11:57
  • @BoBTFish: So, essentially what you're saying is that the foil here is the lack of reflection for structs / the ability to perceive a struct as a tuple of its members? – einpoklum Mar 06 '16 at 11:59
  • dublicate http://stackoverflow.com/questions/35408362 – oklas Mar 06 '16 at 12:04
  • 3
    Re *Why is my case different than an* `std::pair`? What makes you think it is? My compilers don't accept `std::unordered_set>` for the same reason they don't accept `std::unordered_set>`: A hash function is not defined. – David Hammen Mar 06 '16 at 12:05
  • 1
    @oklas I don't see how. – Baum mit Augen Mar 06 '16 at 12:05
  • @{Baum mit Augen} redefine hash generator for class and redefine unordered_map with that hash – oklas Mar 06 '16 at 12:08
  • @einpoklum "Why do compilers (specifically, GCC 4.x / 5.x) offer such a hash, even if the standard doesn't mandate one?" - I think that it is bug of gcc, becouse compilers must prevent unportable possibility without any special command line flag. – oklas Mar 06 '16 at 12:33

3 Answers3

6

As from the documentation, a possible implementation in your case would be:

#include<functional>
#include<unordered_map>

struct A { int x; int y; };

namespace std
{
    template<> struct hash<A>
    {
        using argument_type = A;
        using result_type = std::size_t;
        result_type operator()(argument_type const& a) const
        {
            result_type const h1 ( std::hash<int>()(a.x) );
            result_type const h2 ( std::hash<int>()(a.y) );
            return h1 ^ (h2 << 1);
        }
    };
}

int main() {
    std::unordered_map<A, int> my_map;
}

The compiler us not allowed to generate such a specialization because of the standard that does not define anything like that (as already mentioned in the comments).

skypjack
  • 49,335
  • 19
  • 95
  • 187
  • I actually want to have a hash functor for all POD structs like A (see question edit) - if that's possible. Also, you're only answering a sub-part of one part of thq question. – einpoklum Mar 06 '16 at 11:55
  • Also, why not use the hash structure for pairs? – einpoklum Mar 06 '16 at 12:01
  • 2
    @einpoklum Because there isn't one? I certainly don't remember reading about one, and can't find it on cppreference. Otherwise the pattern could indeed be like writing `operator<` or `operator==` using [`std::tie`](http://en.cppreference.com/w/cpp/utility/tuple/tie). – BoBTFish Mar 06 '16 at 12:04
  • 1
    This should probably be the accepted answer, however change h2 shift left to h2 rotate left. The shift left decreases entropy vs the rotate left. – ThomasMcLeod Oct 16 '19 at 14:57
2

There is a method to generate hash for POD, like good old c style. Only for real POD with no any linked data on the outside of struct. There is no checking of this requirements in code so use it only when you know and can guarantee this. All fields must be initialized (for example by default constructor like this A(), B() etc).

#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
    struct A { int x; int y; };
    struct B { int x; char ch[8] };
#pragma pack(pop)   /* restore original alignment from stack */

struct C { int x __attribute__((packed)); };


template<class T> class PodHash;

template<>
class PodHash<A> {
public:
    size_t operator()(const A &a) const
    {
        // it is possible to write hash func here char by char without using std::string
        const std::string str =
            std::string( reinterpret_cast<const std::string::value_type*>( &a ), sizeof(A) );
        return std::hash<std::string>()( str );
    }
};

std::unordered_map< A, int, PodHash<A> > m_mapMyMapA;
std::unordered_map< B, int, PodHash<B> > m_mapMyMapB;

UPD: Data structure must be defined in data packing section with value of one byte or with pack attribute for prevent padding bytes.

UPD: But I need to warn that replace deafult packing will make data loading/storing from/to memory for some fields little slowly, to prevent this need to arrange structure data fields with granularity that corresponding your (or most popular) architecture.

I suggest that you can add by yourself additional unused fields not for using but for arrange fields in your data structure for best prformance of memory loading/storing. Example:

struct A
{
    char x;           // 1 byte
    char padding1[3]; // 3 byte for the following 'int'
    int y;            // 4 bytes - largest structure member
    short z;          // 2 byte
    char padding2[2]; // 2 bytes to make total size of the structure 12 bytes
};

#pragma pack is supported by, at least:

oklas
  • 7,935
  • 2
  • 26
  • 42
  • 6
    A big problem with this approach is that the struct is likely to have padding bytes in it, and the values of those bytes will be undefined, and yet this hash function will nevertheless factor in those bytes' undefined values when computing the hash. The result is that if you use this function you are likely to get different hash values for structs that are logically the same (but have different data in their padding byte), and the symptom will be that your hash lookups don't work reliably. – Jeremy Friesner Mar 06 '16 at 14:49
  • My thanks for reminder. I have work with this many years ago when I write binary network protocol. The solution exists and portable. Answer updated. – oklas Mar 06 '16 at 15:37
  • 2
    There is no packing in C++. Compilers may offer packing as an extension. Packed structures are allowed to be, and are, slower than non-packed ones. – n. m. could be an AI Mar 06 '16 at 16:16
  • 1
    Thanks. There is more difficult to write network protocol like tcp or udp with no packing, so compilers support this feature. Alredy write remark about slowness, this problem easily solved by manual packing. – oklas Mar 06 '16 at 16:29
0

More flexible way is to declare comparision class and use it as template param of std::unordered_map.

struct A { int x; int y; };

emplate<class T> class MyHash;

template<>
class MyHash<A> {
public:
    size_t operator()(const A &a) const
    {
        result_type const h1 ( std::hash<int>()(a.x) );
        result_type const h2 ( std::hash<int>()(a.y) );
        return h1 ^ (h2 << 1);
    }
};

std::unordered_map<CString,CString,MyHash> m_mapMyMap;

You may want another Hash for same objects. Flexibility appear with code like this:

std::unordered_map<CString,CString, *MyAnotherHas* > m_mapMyMap;
oklas
  • 7,935
  • 2
  • 26
  • 42