0

It seems intuitive that one should be able to use sets of sets, and indeed std::set is designed to support such use by default (since lexicographical ordering is implemented by default.) Similarly, it seems reasonable to expect the same feature of boost::unordered_set. Is there a good reason why boost doesn't implement a generalized hash function for boost::unordered_set by default, something like:

// DEFINE A HASH FUNCTION FOR A HASH-SET THAT COMBINES THE HASH VALUES
// OF THE ELEMENTS OF THAT SET
namespace boost {
    template<typename T>
    size_t hash_value(const boost::unordered_set<T> & set) {
        typename boost::unordered_set<T>::const_iterator it, itend;
        size_t seed = 0;
        for ( it = set.begin(), itend = set.end(); it != itend; it++ ){
            boost::hash_combine(seed,boost::hash_value(*it));
        }
        return seed;
    }
}
ldog
  • 11,707
  • 10
  • 54
  • 70

1 Answers1

2

Generally you want hashes to be fast, ideally constant time.

On a string, one thing you can do is get the length and then only sample X times over its length (which, assuming non-pathological input, will be reasonably good at avoiding collisions).

unordered_set lacks a get_nth, which would allow a similar strategy to keep hashing time down to O(1). Alternatively, unordered_set could keep an xor of its content's hash and use it as its own hash (not hash_combine, as that is order-dependent, and it is hard to remove the 3rd element from the combined hash when there are 100 of them in O(1) time).

Lacking a good O(1) implementation, exposing hash_value by default might seem overly encouraging to the builder of unordered_set<unordered_set<foo>>, when in reality they should be writing unordered_set< my_unordered_set_wrapper<foo> >, where unordered_set_wrapper does the above "xor of contents hashs" to produce O(1) hash_value. Either that, or extend unordered_set.

As to why boost actually did not include it, you'd have to ask everyone who ever added stuff to boost collectively. They do not seem to be present.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • The thing with string hashing is, pathological input does happen, such as addresses which only have difference in one char. Cheating by default is asking for it. – hyde Nov 18 '13 at 04:21
  • Would the hash function I suggested above not work because hash combine is order dependent? – ldog Nov 18 '13 at 17:26
  • @ldog The idea of a cumulative hash on an `unordered_set` that is O(1) would be to update it whenever items are added or removed. The easiest way to make this O(1) is to make it a symmetric xor of the hashes of the contents. This is usually a bad idea, because you want "abc" to hash differently than "bad", but in a set or `unordered_set`, the order elements are added/removed has no significance on the state of the set. – Yakk - Adam Nevraumont Nov 18 '13 at 18:20
  • @Yakk Okay yea I get that there are efficiency concerns, but would a hash function that I suggest even work at all? – ldog Nov 18 '13 at 19:30
  • @ldog Now that I think about it, probably not. For your solution to work, you'd have to have a fixed order for a fixed set of elements that compare equal/have the same hash. I suspect that your `unordered_set` will have a different order depending on the size of the hash table and/or the order of insertion when two elements have a hash collision (as it has only `==` and the hash result to order them usually...). You'd have to test it. – Yakk - Adam Nevraumont Nov 18 '13 at 19:42
  • @Yakk I just tested it, it actually doesn't work! If I replace `hash_combine` with a hash function for which order doesn't matter it actually works. From the documentation of `hash_range` of boost it says " hash_range is sensitive to the order of the elements so it wouldn't be appropriate to use this with an unordered container." – ldog Nov 18 '13 at 20:43