4

I'm working on implementing GetHashCode() based on the HashCode struct in this answer here. Since my Equals method will consider collections using Enumerable.SequenceEqual(), I need to include the collections in my GetHashCode() implementation.

As a starting point, I'm using Jon Skeet's embedded GetHashCode() implementation to test the output of the HashCode struct implementation. This works as expected using the following test below -

private class MyObjectEmbeddedGetHashCode
{
    public int x;
    public string y;
    public DateTimeOffset z;

    public List<string> collection;

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 17;

            hash = hash * 31 + x.GetHashCode();
            hash = hash * 31 + y.GetHashCode();
            hash = hash * 31 + z.GetHashCode();

            return hash;
        }
    }
}

private class MyObjectUsingHashCodeStruct
{
    public int x;
    public string y;
    public DateTimeOffset z;

    public List<string> collection;

    public override int GetHashCode()
    {
        return HashCode.Start
            .Hash(x)
            .Hash(y)
            .Hash(z);
    }
}

[Test]
public void GetHashCode_CollectionExcluded()
{
    DateTimeOffset now = DateTimeOffset.Now;

    MyObjectEmbeddedGetHashCode a = new MyObjectEmbeddedGetHashCode() 
    { 
        x = 1, 
        y = "Fizz",
        z = now,
        collection = new List<string>() 
        { 
            "Foo", 
            "Bar", 
            "Baz" 
        } 
    };

    MyObjectUsingHashCodeStruct b = new MyObjectUsingHashCodeStruct()
    {
        x = 1,
        y = "Fizz",
        z = now,
        collection = new List<string>() 
        { 
            "Foo", 
            "Bar", 
            "Baz" 
        }
    };

    Console.WriteLine("MyObject::GetHashCode(): {0}", a.GetHashCode());
    Console.WriteLine("MyObjectEx::GetHashCode(): {0}", b.GetHashCode());

    Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
}

The next step is to consider the collection in the GetHashCode() calculation. This requires a small addition to the GetHashCode() implementation in MyObjectEmbeddedGetHashCode.

public override int GetHashCode()
{
    unchecked
    {
        int hash = 17;

        hash = hash * 31 + x.GetHashCode();
        hash = hash * 31 + y.GetHashCode();
        hash = hash * 31 + z.GetHashCode();

        int collectionHash = 17;

        foreach (var item in collection)
        {
            collectionHash = collectionHash * 31 + item.GetHashCode();
        }

        hash = hash * 31 + collectionHash;

        return hash;
    }
}

However, this is a little bit more difficult in the HashCode struct. In this example, when a collection of type List<string> is passed into the Hash<T> method, T is List<string> so trying to cast obj to ICollection<T> or IEnumerable<T> doesn't work.

I can successfully cast to IEnumerable, but it causes boxing and I found I have to worry about excluding types like string that implement IEnumerable.

Is there a way to reliably cast obj to ICollection<T> or IEnumerable<T> in this scenario?

public struct HashCode
{
    private readonly int hashCode;

    public HashCode(int hashCode)
    {
        this.hashCode = hashCode;
    }

    public static HashCode Start
    {
        get { return new HashCode(17); }
    }

    public static implicit operator int(HashCode hashCode)
    {
        return hashCode.GetHashCode();
    }

    public HashCode Hash<T>(T obj)
    {
        // I am able to detect if obj implements one of the lower level
        // collection interfaces. However, I am not able to cast obj to
        // one of them since T in this case is defined as List<string>,
        // so using as to cast obj to ICollection<T> or IEnumerable<T>
        // doesn't work.
        var isGenericICollection = obj.GetType().GetInterfaces().Any(
            x => x.IsGenericType && 
            x.GetGenericTypeDefinition() == typeof(ICollection<>));

        var c = EqualityComparer<T>.Default;

        // This works but using IEnumerable causes boxing.
        // var h = c.Equals(obj, default(T)) ? 0 : ( !(obj is string) && (obj is IEnumerable) ? GetCollectionHashCode(obj as IEnumerable) : obj.GetHashCode());

        var h = c.Equals(obj, default(T)) ? 0 : obj.GetHashCode();
        unchecked { h += this.hashCode * 31; }
        return new HashCode(h);
    }

    public override int GetHashCode()
    {
        return this.hashCode;
    }
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
brdmllr
  • 43
  • 1
  • 6
  • 1
    Side note: `GetHashCode` is expected to be fast, reflection and iteration over all items are not the best examples of "fast" code. – Alexei Levenkov Feb 16 '15 at 16:49
  • What problem are you trying to solve? You don't have to include the collection in the hash. If XYZ give you a good hash distribution then stop there. – paparazzo Feb 16 '15 at 16:59
  • @AlexeiLevenkov I put a stopwatch around the calling of GetHashCode with and without the LINQ query here and it runs 5 ms with it and 3 ms without, so there is a performance impact that needs to be considered. – brdmllr Feb 17 '15 at 14:12

1 Answers1

4

You can address the collection issue in a couple of ways:

  1. Use a non-generic interface, e.g. ICollection or IEnumerable.
  2. Add an overload for the Hash() method, e.g. Hash<T>(IEnumerable<T> list) { ... }

That said, IMHO it would be better to just leave the struct HashCode alone and put the collection-specific code in your actual GetHashCode() method. E.g.:

public override int GetHashCode()
{
    HashCode hash = HashCode.Start
        .Hash(x)
        .Hash(y)
        .Hash(z);

    foreach (var item in collection)
    {
        hash = hash.Hash(item);
    }

    return hash;
}

If you do want a full-featured version of the struct HashCode type, it looks to me as though that same page you referenced has one: https://stackoverflow.com/a/2575444/3538012

The naming of the members is different, but it's basically the same idea as the struct HashCode type, but with overloads for other complex types (as in my suggestion #2 above). You could use that, or just apply the techniques there to your implementation of struct HashCode, preserving the naming conventions used in it.

Community
  • 1
  • 1
Peter Duniho
  • 68,759
  • 7
  • 102
  • 136
  • I marked this as the answer because adding an overload for the Hash method provided the desired behavior. However, I encountered an infinite recursion issue which led to a StackOverflowException in my Equals() override due to a bidirectional association between a couple of my classes. This caused me to abandon including collections and my reference types in my IEquatable implementation and I ended up using a solution similar to what @Blam mentioned in a comment on my question. – brdmllr Feb 20 '15 at 14:23