Why doesn't boost::hash_value support boost::unordered_set by default?

Question

It seems intuitive that one should be able to use sets of sets, and indeed std::set is designed to support such use by default (since lexicographical ordering is implemented by default.) Similarly, it seems reasonable to expect the same feature of boost::unordered_set. Is there a good reason why boost doesn't implement a generalized hash function for boost::unordered_set by default, something like:

// DEFINE A HASH FUNCTION FOR A HASH-SET THAT COMBINES THE HASH VALUES
// OF THE ELEMENTS OF THAT SET
namespace boost {
    template<typename T>
    size_t hash_value(const boost::unordered_set<T> & set) {
        typename boost::unordered_set<T>::const_iterator it, itend;
        size_t seed = 0;
        for ( it = set.begin(), itend = set.end(); it != itend; it++ ){
            boost::hash_combine(seed,boost::hash_value(*it));
        }
        return seed;
    }
}

score 2 · Answer 1 · answered Nov 18 '13 at 03:56

2

Generally you want hashes to be fast, ideally constant time.

On a string, one thing you can do is get the length and then only sample X times over its length (which, assuming non-pathological input, will be reasonably good at avoiding collisions).

unordered_set lacks a get_nth, which would allow a similar strategy to keep hashing time down to O(1). Alternatively, unordered_set could keep an xor of its content's hash and use it as its own hash (not hash_combine, as that is order-dependent, and it is hard to remove the 3rd element from the combined hash when there are 100 of them in O(1) time).

Lacking a good O(1) implementation, exposing hash_value by default might seem overly encouraging to the builder of unordered_set<unordered_set<foo>>, when in reality they should be writing unordered_set< my_unordered_set_wrapper<foo> >, where unordered_set_wrapper does the above "xor of contents hashs" to produce O(1) hash_value. Either that, or extend unordered_set.

As to why boost actually did not include it, you'd have to ask everyone who ever added stuff to boost collectively. They do not seem to be present.

answered Nov 18 '13 at 03:56

Yakk - Adam Nevraumont

262,606
27
330
524

The thing with string hashing is, pathological input does happen, such as addresses which only have difference in one char. Cheating by default is asking for it. – hyde Nov 18 '13 at 04:21
Would the hash function I suggested above not work because hash combine is order dependent? – ldog Nov 18 '13 at 17:26
@ldog The idea of a cumulative hash on an `unordered_set` that is O(1) would be to update it whenever items are added or removed. The easiest way to make this O(1) is to make it a symmetric xor of the hashes of the contents. This is usually a bad idea, because you want "abc" to hash differently than "bad", but in a set or `unordered_set`, the order elements are added/removed has no significance on the state of the set. – Yakk - Adam Nevraumont Nov 18 '13 at 18:20
@Yakk Okay yea I get that there are efficiency concerns, but would a hash function that I suggest even work at all? – ldog Nov 18 '13 at 19:30
@ldog Now that I think about it, probably not. For your solution to work, you'd have to have a fixed order for a fixed set of elements that compare equal/have the same hash. I suspect that your `unordered_set` will have a different order depending on the size of the hash table and/or the order of insertion when two elements have a hash collision (as it has only `==` and the hash result to order them usually...). You'd have to test it. – Yakk - Adam Nevraumont Nov 18 '13 at 19:42
@Yakk I just tested it, it actually doesn't work! If I replace `hash_combine` with a hash function for which order doesn't matter it actually works. From the documentation of `hash_range` of boost it says " hash_range is sensitive to the order of the elements so it wouldn't be appropriate to use this with an unordered container." – ldog Nov 18 '13 at 20:43

Why doesn't boost::hash_value support boost::unordered_set by default?

1 Answers1

Linked