1

I'd like to have an unordered_map with a struct, that I'd like to use as the key, of multiple std::set< std::string >.

I see that a custom hash function is required and that a string can have std::hash applied; however, I cannot determine what should be returned to satisfy the purpose of the hash function of these sets for an unordered_map.

How should a custom hash function return?

  • 1
    The description of the data structure you want is a bit vague. To clearify, you want a `std::unordered_map` where `myStruct` is a custom *class* with multiple `std::set` members? And you need advice on how to create a *hash function* for `myStruct`? – Felix Glas Jul 17 '14 at 22:05
  • @Snps Yes, but I was creating `myStruct` as a `struct` rather than `class`, yet I'll take anything I can get. Thank you for looking! –  Jul 17 '14 at 22:08
  • 2
    In C++ `struct` means **class** with default visibility set to *public*. – Felix Glas Jul 17 '14 at 22:09
  • 1
    There is no *real* difference between `struct` and `class` ;) – leemes Jul 17 '14 at 22:09
  • Thanks guys! I was just following this until I found a need to do otherwise http://stackoverflow.com/a/54596/1382306 I'm guessing that's now? ;)) –  Jul 17 '14 at 22:10
  • Well, simply use `struct` if everything is going to be public anyway, i.e. you don't implement the principle of *data hiding*. (But you can change it if you decide to do so.) -- That doesn't change your question though :) – leemes Jul 17 '14 at 22:13
  • @Gracchus: A `std::unordered_map` holds key-value pairs. Is your struct the __key__ or the __value__? Your question isn't clear. – Blastfurnace Jul 17 '14 at 22:48
  • `struct` defines a class, simple as. – Lightness Races in Orbit Jul 17 '14 at 22:58

2 Answers2

2

I think this may be a better alternative to Snps' answer. This implements a specialization of std::hash for a user-defined type and it hashes the struct without creating temporary strings.

I've copied two functions from Boost, hash_combine and hash_range, to compute a single hash value from two containers.

#include <iostream>
#include <functional>
#include <set>
#include <unordered_map>

// user-defined type
struct myStruct {
    std::set<std::string> s1;
    std::set<std::string> s2;

    bool operator==(const myStruct &other) const {
        return (s1 == other.s1) && (s2 == other.s2);
    }
};

// hash helper functions plagiarized from Boost
template <typename T>
void hash_combine(size_t &seed, const T &v)
{
    using std::hash;
    seed ^= hash<T>()(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

template <typename It>
void hash_range(size_t &seed, It first, It last)
{
    for (; first != last; ++first) {
        hash_combine(seed, *first);
    }
}

// std::hash specialization
namespace std
{
    template<> struct hash<myStruct> {
        size_t operator()(const myStruct &key) const {
            size_t seed = 0;
            hash_range(seed, key.s1.begin(), key.s1.end());
            hash_range(seed, key.s2.begin(), key.s2.end());
            return seed;
        }
    };
}

int main()
{
    std::unordered_map<myStruct, int> myMap;

    myStruct ms1{ { "apple", "pear", "orange" }, { "red", "green", "blue" } };
    myStruct ms2{ { "pear", "apple", "orange" }, { "red", "green", "blue" } };
    myStruct ms3{ { "apple", "banana", "orange" }, { "red", "green", "blue" } };

    myMap[ms1] = 1;
    myMap[ms2] = 2;
    myMap[ms3] = 3;

    std::cout << myMap.size() << '\n'; // output: 2
}
Community
  • 1
  • 1
Blastfurnace
  • 18,411
  • 56
  • 55
  • 70
  • Thank you Blastfurnace! Two questions: could you detail the last line of `hash_combine`, and why should `myMap.size()` output `2`? Shouldn't it be '3'? It's not a criticism. It comes from inexperience. Thank you so much in advance! –  Jul 18 '14 at 15:39
  • 1
    @Gracchus: The `hash_combine` function is from the [Boost library](http://www.boost.org/doc/libs/1_55_0/doc/html/hash/combine.html) and is used to incrementally build a hash from different values. The method and numeric constants were derived by someone much smarter than I to produce a uniform result. The output is 2 because `ms1` and `ms2` contain identical `std::set` values and therefore produce the same hash. A `std::unordered_map` only holds unique keys so the second insertion fails because it would be a duplicate key. A `std::unordered_multimap` allows duplicate keys. – Blastfurnace Jul 18 '14 at 15:54
  • As for the first answer: wow! For the second: facepalm duh. Thank you so much! –  Jul 18 '14 at 16:01
  • @Gracchus "facepalm duh", I did put in an effort in order to help you. Is the final outcome/best answer all that matters? :) – Felix Glas Jul 18 '14 at 16:11
  • @Snps: It was your answer that motivated me to create this answer and I learned things in the process. I felt there had to be a better way without all the string concatenation. I thank you! – Blastfurnace Jul 18 '14 at 16:15
  • @Snps I upvoted your answer, and I think it's very good, but I think that stack wants/demands "the ultimate correct answer", so if one comes along with a "perfect" `hash_combine`, that will get the check. I absolutely do ***NOT*** agree with downvoter. Your answer may not be "most correct", but it is not bad by any means...in my inexperienced opinion. –  Jul 18 '14 at 16:18
  • @Gracchus I agree in that this answer deserves the accept more than mine, it's better, I just reacted on the "facepalm duh"-comment which I (mis?)interpreted as discredit to my answer. – Felix Glas Jul 18 '14 at 18:00
  • @Snps Oh jeez, no! I was talking about Blastfurnace's responses to my two questions in the comment above. I should've known that `ms1 == ms2`. :/ –  Jul 18 '14 at 18:07
  • @Gracchus Ah just a misunderstanding then! Happy coding! :) – Felix Glas Jul 18 '14 at 18:26
  • @Snps You too! And thank you for taking the time to produce such a thorough answer! –  Jul 18 '14 at 18:38
  • 1
    This is a good answer. It's true `boost::hash_combine` isn't ideal (or, even, good in many situations). Combining hashes is not an easy problem; you might find the paper 'Types don't know #' by @HowardHinnant to be helpful: http://isocpp.org/blog/2014/05/n3980 – Nik Bougalis Jul 18 '14 at 22:48
  • @NikBougalis: Thanks for the link, that was an interesting read. – Blastfurnace Jul 19 '14 at 00:23
1

The requirements of std::hash is as follows: (http://en.cppreference.com/w/cpp/utility/hash)

The hash template defines a function object that implements a hash function. Instances of this function object satisfy Hash. In particular, they define an operator() that:

  1. Accepts a single parameter of type Key.
  2. Returns a value of type size_t that represents the hash value of the parameter.
  3. Does not throw exceptions when called.
  4. For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) == std::hash<Key>()(k2).
  5. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0 / std::numeric_limits<size_t>::max().

The hash template is both CopyConstructible and Destructible.

So what you need is basically a function that returns a std::size_t that is unique for every myStruct object and that returns the same value for objects that are considered equivalent.

Edit: The following may not be the most robust way to generate the hash, but it will serve as a basic example of how it can be done.

One way to do it is to use the standard specialization for std::hash<std::string> by concatenating all the strings in each std::set member using a separator sequence and then concatenating all the resulting merged strings into one and returning the hash value using the standard hash function.

The merged "super"-string will be unique for each myStruct object if the member std::sets differ, and still same when the members don't differ as std::set is an ordered container.

struct myStruct {
    std::set<std::string> s1;
    std::set<std::string> s2;
};

std::string mergeAllStrings(const myStruct& ms) {
    static const std::string SEPARATOR = "#¤%&"; // Some uncommon sequence.
    std::string super;
    for (const auto& s : ms.s1) {
        super += s + SEPARATOR; // Append separator.
    }
    for (const auto& s : ms.s2) {
        super += s + SEPARATOR; // Append separator.
    }
    return super;
}

int main() {
    myStruct ms1{{"apple", "pear", "orange"}, {"red", "green", "blue"}};
    myStruct ms2{{"pear", "apple", "orange"}, {"red", "green", "blue"}};
    myStruct ms3{{"apple", "banana", "orange"}, {"red", "green", "blue"}};

    std::cout << std::hash<std::string>()(mergeAllStrings(ms1)) << std::endl;
    std::cout << std::hash<std::string>()(mergeAllStrings(ms2)) << std::endl;
    std::cout << std::hash<std::string>()(mergeAllStrings(ms3)) << std::endl;
}

Output:

2681724430859750844 // Same
2681724430859750844 // Same
2942368903851914580 // Different

You can now create a hash functor e.g. as:

struct MyHash {
    std::size_t operator()(const myStruct& ms) const {
        return std::hash<std::string>()(mergeAllStrings(ms));
    }
};

and use it with std::unordered_map as:

std::unordered_map<myStruct, myValue, MyHash> m;

Note that you should provide a custom equal_to functor as well.

Felix Glas
  • 15,065
  • 7
  • 53
  • 82