1

I have a HashSet<T> and I want to get the hashcode of the HashSet<T> based on the items it contains. I am trying to identify a HashSet<T> based on it's values. I know I can use set operations to check if two HashSet<T>'s are equal. I need a hashcode because I want to use memoization in a function which takes a HashSet<T> as an argument.

HashSet<string> one = new HashSet<string>();
one.Add("java");
Console.WriteLine(one.GetHashCode());

HashSet<string> two = new HashSet<string>();
two.Add("java");
Console.WriteLine(two.GetHashCode() == one.GetHashCode()); 
// I want the hash code to be same for one and two,
// since they both contain the same strings.

Is there a way to identify a HashSet<T> based on it's values without overriding the GetHashCode method? If there isn't one could you please provide the override implementation for GetHashCode of a HashSet<T> of strings?

James Z
  • 12,209
  • 10
  • 24
  • 44

2 Answers2

4

The HashSet<T> has a static CreateSetComparer method that could solve your problem if it worked correctly, but it doesn't.

IEqualityComparer<HashSet<string>> comparer = HashSet<string>.CreateSetComparer();
Console.WriteLine(comparer.GetHashCode(two) == comparer.GetHashCode(one));

This will print true in your example, but if the two hashsets are configured with a non-default comparer, for example StringComparer.OrdinalIgnoreCase, and contain items that are different according to the default comparer, for example "java" and "Java", it will print false.

The HashSetEqualityComparer<T> class below is aware of the Comparer of each hashset, and produces identical hashcodes for hashsets that contain the same elements, regardless of the order that they were inserted in the hashsets.

public class HashSetEqualityComparer<T> : IEqualityComparer<HashSet<T>>
{
    public bool Equals(HashSet<T> x, HashSet<T> y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (x is null || y is null) return false;
        if (!ReferenceEquals(x.Comparer, y.Comparer)) return false;
        return x.SetEquals(y);
    }

    public int GetHashCode(HashSet<T> hashSet)
    {
        if (hashSet is null) return 0;
        HashCode hashCode = new();
        foreach (int value in hashSet
            .Select(x => hashSet.Comparer.GetHashCode(x)).Order())
        {
            hashCode.Add(value);
        }
        return hashCode.ToHashCode();
    }
}

Usage example:

HashSetEqualityComparer<string> comparer = new();
Console.WriteLine(comparer.GetHashCode(two) == comparer.GetHashCode(one));

The HashCode is a built-in struct, that is used for combining multiple values into a single hashcode.

Sorting the hashcodes of the contained elements is a relatively expensive O(n log n) operation. In case paying this performance cost is not desirable, you can get a valid hashcode faster using the XOR technique:

public int GetHashCode(HashSet<T> hashSet)
{
    if (hashSet is null) return 0;
    int hashCode = 0;
    foreach (T item in hashSet)
    {
        hashCode = hashCode ^ HashCode.Combine(hashSet.Comparer.GetHashCode(item));
    }
    return hashCode;
}

The quality of the resulting hashcode might not be as good though. The HashCode.Combine method with a single argument can be used in order to diffuse the hashcode of each element across a large range. Which improves the quality of the resulting hashcode in case, for example, that the elements are small integer numbers.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Your assertion that there is a bug in `CreateSetComparer` is not entirely true. It does work if the comparers are non-default https://dotnetfiddle.net/2qz085 it only doesn't work if you create a *new* set using that comparer, or if the two sets use different comparers. Looking at the source code there is specific handling if the comparers of the two sets are the same https://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs,1476 – Charlieface Jul 16 '23 at 20:57
  • @Charlieface the `Equals` works correctly. It's the `GetHashCode` that doesn't work ([demo](https://dotnetfiddle.net/GPNjjS)). – Theodor Zoulias Jul 16 '23 at 21:18
  • 1
    Fair enough that's just an out-and-out bug. FTR it's the same in .NET 7 and it seems they *want* the bug to be there https://github.com/dotnet/runtime/blob/44c46e181a2a6f23a66f10924d8b7840150f18b7/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs#L66 – Charlieface Jul 16 '23 at 21:35
  • @Charlieface yep, Microsoft love their bugs. They call them "by design", so they are here to stay. :-) – Theodor Zoulias Jul 16 '23 at 21:46
-1

You can use the HashCode struct to combine hashes. You need to sort it first in order to get a deterministic result, which is an expensive operation, so another hashcode combining method may be better.

var hs = new HashCode();
foreach (var item in yourHashSet.Order())
    hs.Add(item, yourHashSet.Comparer);

var hashCode = hs.ToHashCode();

You can also do this using an extension method, see this answer.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • 1
    @TheodorZoulias I have just run a test and yes it does produce different hashcodes for two HashSets that contain the same elements, that were added in different order – karthik suryadevara Jul 16 '23 at 16:45
  • 1
    @TheodorZoulias True, I didn't realize it. Need to sort first, I have added this. But I like your answer better, I'm going to leave this one here for completeness – Charlieface Jul 16 '23 at 20:10
  • I don't think it compiles. The `Enumerable.Order` takes a `IComparer`, and the `HashSet.Comparer` is `IEqualityComparer`. – Theodor Zoulias Jul 16 '23 at 20:45