30

I wish to use HashSet<T> as the key to a Dictionary:

Dictionary<HashSet<T>, TValue> myDictionary = new Dictionary<HashSet<T>, TValue>();

I want to look up values from the dictionary such that two different instances of HashSet<T> that contain the same items will return the same value.

HashSet<T>'s implementations of Equals() and GetHashCode() don't seem to do this (I think they're just the defaults). I can override Equals() to use SetEquals() but what about GetHashCode()? I feel like I am missing something here...

GraemeF
  • 11,327
  • 5
  • 52
  • 76
  • 1
    Keep in mind that if you modify a HashSet after you use it as a key, your dictionary could stop working as expected. `As long as an object is used as a key in the Dictionary, it must not change in any way that affects its hash value. Every key in a Dictionary must be unique according to the dictionary's equality comparer.` [link](https://msdn.microsoft.com/en-us/library/xfhwa508.aspx) – hypehuman Dec 16 '15 at 20:58
  • Yep, these days an `ImmutableHashSet` would be a better choice! – GraemeF Dec 17 '15 at 21:45

3 Answers3

50

You could use the set comparer provided by HashSet<T>:

var myDictionary = new Dictionary<HashSet<T>, TValue>(HashSet<T>.CreateSetComparer());
digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • 4
    didn't know that one existed! looks interesting (and now I feel foolish for writing an answer with a custom one ;p) – Marc Gravell May 06 '11 at 10:43
  • 1
    @Marc: eh eh, me neither until Jon Skeet once showed it to me :) – digEmAll May 06 '11 at 10:51
  • Of course, I completely forgot that the dictionary uses a comparer too - d'oh! Also I didn't know about that one :) – GraemeF May 06 '11 at 11:13
  • Unfortunately, `HashSet.CreateSetComparer()` does not seem to exist on .NET Core (yet) – Pac0 Sep 14 '17 at 09:37
  • 1
    Warning!! `HashSet.CreateSetComparer` will use `EqualityComparer.Default`!! http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs I needed a culture sensitive version of this, but unfortunately `HashSetEqualityComparer` which has such a constructor, is internal. – David S. Feb 17 '18 at 14:25
9

digEmAll's answer is clearly the better choice in practice, since it uses built in code instead of reinventing the wheel. But I'll leave this as a sample implementation.


You can use implement an IEqualityComparer<HashSet<T>> that uses SetEquals. Then pass it to the constructor of the Dictionary. Something like the following(Didn't test it):

class HashSetEqualityComparer<T>: IEqualityComparer<HashSet<T>>
{
    public int GetHashCode(HashSet<T> hashSet)
    {
        if(hashSet == null)
           return 0;
        int h = 0x14345843; //some arbitrary number
        foreach(T elem in hashSet)
        {
            h = unchecked(h + hashSet.Comparer.GetHashCode(elem));
        }
        return h;
    }

    public bool Equals(HashSet<T> set1, HashSet<T> set2)
    {
        if(set1 == set2)
            return true;
        if(set1 == null || set2 == null)
            return false;
        return set1.SetEquals(set2);
    }
}

Note that the hash function here is commutative, that's important because the enumeration order of the elements in the set is undefined.

One other interesting point is that you can't just use elem.GetHashCode since that will give wrong results when a custom equality comparer was supplied to the set.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • Note that `^` is *generally* a bad idea in hashes, but in this case since we know the values are unique it probably works. – Marc Gravell May 06 '11 at 10:47
  • 2
    I chose ^ because it commutes and because the values are unique and thus the x^x=0 problem isn't that relevant. But probably addition is still a better idea. – CodesInChaos May 06 '11 at 10:50
  • I don't know if it is appropriate to ask questions in comments but can you explain or post a link that elaborates on why xor is a bad idea in hashes? I thought that it was a good idea as long as the values participating in the hashing was also part of the equality comparison. Thanks! – faester May 06 '11 at 10:54
  • 2
    One reason why xor is bad is because duplicates cancel out. x^x=0 for all x. So for example you have a 2d vector, and use hashfunction `X^Y` every vector with `X==Y` will collide. And the fact that xor is commutative means that swapping the value of properties will give collisions too. e.g. `X=1,Y=5` collides with `X=5,Y=1` – CodesInChaos May 06 '11 at 10:56
  • @faester @CodeInChaos: Meanwhile using `+` doesn't guarantee the hash code to be unique either, all `X == -Y` instances get the same hash code. But anyway `+` is better than `^`, both are correct. The one who has less probability to get a same code from two different objects is better. – Cheng Chen May 06 '11 at 11:03
  • 4
    A unique hash can only be guaranteed in a few cases. But yes both + and ^ are usually bad choices to combine hashes. But I chose them in this case because they are commutative. The common choices to combine hashes(like `hash=hash*primeConstant+elemHash`) are non commutative by design. And here commutative combining is required. Btw the MS implementation uses `^` – CodesInChaos May 06 '11 at 11:06
  • @CodeInChaos: Thanks! -- I have seen the `^` at MS implementation why I got confused. But thanks for a the explanation. – faester May 06 '11 at 11:09
  • Onj .NET Core, you apparently have to create your own comparer, so thank you for this example implementation – Pac0 Sep 14 '17 at 09:39
2

You can provide a IEqualityComparer<HashSet<T>> to the Dictionary constructor and make the desired implementation in that comparer.

faester
  • 14,886
  • 5
  • 45
  • 56