How to hash an unordered_map?

Question

boost::hash has hashing functions for most builtin types including containers.

But as stated in the boost::hash_range function description, the hash algorithm for ranges

is sensitive to the order of the elements so it wouldn't be appropriate to use this with an unordered container

And thus there is no boost::hash specialization for std::unordered_map nor boost::unordered_map.

The question is:

Is there an "easy and efficient" way to hash an unordered_map without reimplementing a hash algorithm from scratch ?

Is it too much to ask what this is going to be used for? Using a tree as a key is rather odd. — BlamKiwi, Aug 28 '14 at 02:07
I echo that question. It is not clear to me what the use would be of a hash value for an unordered data structure. — user3344003, Sep 01 '14 at 04:00
The use case is basically a json-like object that is used as a key and thus need to be hashable, since that object is recursive (can be a tree) one of the possible "forme" (implemented as a variant) of that object is to be an unordered_map itself — Drax, Sep 26 '14 at 10:34

user541686 · Accepted Answer · 2014-08-29T20:33:05.993

8

The problem here is that there is no guarantee that the items even have an ordering among them.
So, sorting the items may very well not work for arbitrary unordered containers. You have 2 options:

Just XOR the hashes of all the individual elements. This is the fastest.
First sort the hashes of the containers, and then hash those. This may result in a better hash.

edited Aug 29 '14 at 20:33

answered Aug 29 '14 at 20:28

user541686

205,094
128
528
886

Interesting idea to use XOR - it's order independent naturally. Doesn't distribute the bits, but in this application that might not matter. – Mark Ransom Aug 29 '14 at 20:41
The XOR operation is smart but it could lower the quality of the job done by the source hash function. The fact requires an accurate analysis but the "guarantee" that the resulting hash is "quite" unique can be greatly reduced. – Stefano Buora Sep 01 '14 at 12:36
2

@StefanoBuora: Hence why I said the second one may result in a better hash. – user541686 Sep 01 '14 at 17:34

score 1 · Answer 2 · edited May 23 '17 at 10:28

You can of course convert the unordered_map to some other data structure that has a guaranteed order and use that to generate the hash.

A better idea might be to hash each individual element of the map, put those hashes into a vector, then sort and combine the hashes. See for example How do I combine hash values in C++0x? to combine the hashes.

template<typename Hash, typename Iterator>
size_t order_independent_hash(Iterator begin, Iterator end, Hash hasher)
{
    std::vector<size_t> hashes;
    for (Iterator it = begin; it != end; ++it)
        hashes.push_back(hasher(*it));
    std::sort(hashes.begin(), hashes.end());
    size_t result = 0;
    for (auto it2 = hashes.begin(); it2 != hashes.end(); ++it2)
        result ^= *it2 + 0x9e3779b9 + (result<<6) + (result>>2);
    return result;
}

Testing this on shuffled vectors shows that it always returns the same hash.

Now to adapt that basic concept to work specifically with unordered_map. Since the iterator of unordered_map returns a pair, we need a hash function for that too.

namespace std
{
    template<typename T1, typename T2>
    struct hash<std::pair<T1,T2> >
    {
        typedef std::pair<T1,T2> argument_type;
        typedef std::size_t result_type;

        result_type operator()(argument_type const& s) const
        {
            result_type const h1 ( std::hash<T1>()(s.first) );
            result_type const h2 ( std::hash<T2>()(s.second) );
            return h1 ^ (h2 + 0x9e3779b9 + (h1<<6) + (h1>>2));
        }
    };

    template<typename Key, typename T>
    struct hash<std::unordered_map<Key,T> >
    {
        typedef std::unordered_map<Key,T> argument_type;
        typedef std::size_t result_type;

        result_type operator()(argument_type const& s) const
        {
            return order_independent_hash(s.begin(), s.end(), std::hash<std::pair<Key,T> >());
        }
    };
}

See it in action: http://ideone.com/WOLFbc

score 0 · Answer 3 · answered Aug 11 '14 at 14:33

I think you may be confusing what hash is used for. It is for keys used to identify elements to determine where to store them. Two equivalent elements should has to the same value.

Are you trying to see if two unordered maps are equivalent and storing them in some kind of container?

The keys to an unordered map - well those are hashed. In fact the container would have been called hash_map except that such a container already existed.

But ok, suppose you really do want to store unordered-maps and compare to see if two are equivalent. Well you'd have to come up with a hashing algorithm that would return the same value regardless of the position of the elements it contains. A checksum of all its elements (keys and values) would be one possible way.

Note also that just because two elements have the same hash value, it doesn't mean they are equivalent. It just means that if the hash value differs they definitely are not equivalent. In fact checksums are often used to verify data for exactly this reason. A wrong checksum is proof the data is not valid, and given a good formula, a correct one makes it highly likely although not certain that it is.

No worries, i understand the general concept of hashing and its role as fast access to keys in containers. But yes i'm indeed trying to hash unoredered_map to use them as keys (in fact the real use case is slightly more complicated) but i want to see if something exists before going and crafting a home made algorithm which will take time and probably won't be super efficient. — Drax, Aug 11 '14 at 14:38
Keys are also immutable. That means once you have set your unordered_map as a key you cannot change it. — CashCow, Aug 11 '14 at 14:49

score 0 · Answer 4 · edited May 23 '17 at 12:10

I'm curious given that you are trying to hash the unordered_map to use it as a key, and given that once you've hashed the unordered_map you won't be changing it (unless you use it to create a new key), would the performance hit of converting the unordered_map to an ordered map be acceptable (and then, of course, hashing the ordered map and using that as a key)? Or is the problem with that approach that you need the faster lookup time provided by an unordered_map?

For what it's worth there may be a space advantage to using an ordered map (based on the accepted answer in the following post an unordered_map generally uses more memory):

Is there any advantage of using map over unordered_map in case of trivial keys?

score 0 · Answer 5 · answered Aug 28 '14 at 19:09

You haven't specified any performance requirements, but if you just want a "quick and dirty" solution that won't require much coding on your behalf and would take advantage of boost::hash, you can copy the range of items from unordered_map to a vector, std::sort the vector, and then pass it to boost::hash_range.

Hardly the most effective solution, though, and not one you'd want to use often or with many elements.

My preferred approach would be a specialization of unordered_map that keeps a running, up-to-date hash of the contents — you shouldn't have to pass over all elements and perform a calculation to get the current value. Instead, a member of the data structure should reflect the hash, and be modified in real-time as elements are inserted or removed, and read when needed.

How to hash an unordered_map?

5 Answers5