10

I'm looking at how build the best HashCode for a class and I see some algorithms. I saw this one : Hash Code implementation, seems to be that .NET classes HashCode methods are similar (see by reflecting the code).

So question is, why don't create the above static class in order to build a HashCode automatically, just by passing fields we consider as a "key".

// Old version, see edit
public static class HashCodeBuilder
{
    public static int Hash(params object[] keys)
    {
        if (object.ReferenceEquals(keys, null))
        {
            return 0;
        }

        int num = 42;

        checked
        {
            for (int i = 0, length = keys.Length; i < length; i++)
            {
                num += 37;
                if (object.ReferenceEquals(keys[i], null))
                { }
                else if (keys[i].GetType().IsArray)
                {
                    foreach (var item in (IEnumerable)keys[i])
                    {
                        num += Hash(item);
                    }
                }
                else
                {
                    num += keys[i].GetHashCode();
                }
            }
        }

        return num;
    }
}

And use it as like this :

// Old version, see edit
public sealed class A : IEquatable<A>
{
    public A()
    { }

    public string Key1 { get; set; }
    public string Key2 { get; set; }
    public string Value { get; set; }

    public override bool Equals(object obj)
    {
        return this.Equals(obj as A);
    }

    public bool Equals(A other)
    {
        if(object.ReferenceEquals(other, null)) 
            ? false 
            : Key1 == other.Key1 && Key2 == other.Key2;
    }

    public override int GetHashCode()
    {
        return HashCodeBuilder.Hash(Key1, Key2);
    }
}

Will be much simpler that always is own method, no? I'm missing something?


EDIT

According all remarks, I got the following code :

public static class HashCodeBuilder
{
    public static int Hash(params object[] args)
    {
        if (args == null)
        {
            return 0;
        }

        int num = 42;

        unchecked
        {
            foreach(var item in args)
            {
                if (ReferenceEquals(item, null))
                { }
                else if (item.GetType().IsArray)
                {
                    foreach (var subItem in (IEnumerable)item)
                    {
                        num = num * 37 + Hash(subItem);
                    }
                }
                else
                {
                    num = num * 37 + item.GetHashCode();
                }
            }
        }

        return num;
    }
}


public sealed class A : IEquatable<A>
{
    public A()
    { }

    public string Key1 { get; set; }
    public string Key2 { get; set; }
    public string Value { get; set; }

    public override bool Equals(object obj)
    {
        return this.Equals(obj as A);
    }

    public bool Equals(A other)
    {
        if(ReferenceEquals(other, null))
        {
            return false;
        }
        else if(ReferenceEquals(this, other))
        {
            return true;
        }

        return Key1 == other.Key1
            && Key2 == other.Key2;
    }

    public override int GetHashCode()
    {
        return HashCodeBuilder.Hash(Key1, Key2);
    }
}
Community
  • 1
  • 1
Arnaud F.
  • 8,252
  • 11
  • 53
  • 102

3 Answers3

12

Your Equals method is broken - it's assuming that two objects with the same hash code are necessarily equal. That's simply not the case.

Your hash code method looked okay at a quick glance, but could actually do some with some work - see below. It means boxing any value type values and creating an array any time you call it, but other than that it's okay (as SLaks pointed out, there are some issues around the collection handling). You might want to consider writing some generic overloads which would avoid those performance penalties for common cases (1, 2, 3 or 4 arguments, perhaps). You might also want to use a foreach loop instead of a plain for loop, just to be idiomatic.

You could do the same sort of thing for equality, but it would be slightly harder and messier.

EDIT: For the hash code itself, you're only ever adding values. I suspect you were trying to do this sort of thing:

int hash = 17;
hash = hash * 31 + firstValue.GetHashCode();
hash = hash * 31 + secondValue.GetHashCode();
hash = hash * 31 + thirdValue.GetHashCode();
return hash;

But that multiplies the hash by 31, it doesn't add 31. Currently your hash code will always return the same for the same values, whether or not they're in the same order, which isn't ideal.

EDIT: It seems there's some confusion over what hash codes are used for. I suggest that anyone who isn't sure reads the documentation for Object.GetHashCode and then Eric Lippert's blog post about hashing and equality.

Caveman74
  • 127
  • 9
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • It's okay for the equality, you're right. I edited original post to fix it. What about the HashCode part? – Arnaud F. Mar 27 '11 at 16:59
  • on another note the use of the ternary if is pretty bad. – alternative Mar 27 '11 at 17:00
  • 1
    @Arnaud: No, it's *not* okay for equality. – Jon Skeet Mar 27 '11 at 17:01
  • @Jon: I always assumed the hash code was used to check object equality in structures like hash tables, so shouldn't checking for equality and checking for the hash code equality produce the same results in a proper implementation? Or should you always check for real object equality after retrieving an object by hash code? – Wiebe Tijsma Mar 27 '11 at 17:02
  • @mathepic: Agreed, I would negate the reference equality check and then use `&&`. – Jon Skeet Mar 27 '11 at 17:02
  • 3
    @Zidad: The hash code is used as a first pass, but not the *only* contention. You should indeed always check for real equality after hash equality. – Jon Skeet Mar 27 '11 at 17:02
  • Hash codes will collide. You _always_ need a true equality check. – SLaks Mar 27 '11 at 17:03
  • @mathepic: Yes, I wouldn't bother checking for hash equality when *implementing* Equals. – Jon Skeet Mar 27 '11 at 17:05
  • @Jon Skeet I just realized that, I misread your post and thought you said to use the hash code as a first pass in Equals(), then realized you were talking about the hash tables. Nevermind. – alternative Mar 27 '11 at 17:06
  • I edited first post, hope I understood all what you said. Thanks for explanation, very helpful ! – Arnaud F. Mar 27 '11 at 17:28
  • 2
    @Arnaud: Your GetHashCode method should use *unchecked* arithmetic, not checked. – Jon Skeet Mar 27 '11 at 19:04
  • After reading Eric Lippert's blog, I understand why unchecked, thanks :) – Arnaud F. Mar 28 '11 at 11:21
3

This is what I'm using:

public static class ObjectExtensions
{
    /// <summary>
    /// Simplifies correctly calculating hash codes based upon
    /// Jon Skeet's answer here
    /// http://stackoverflow.com/a/263416
    /// </summary>
    /// <param name="obj"></param>
    /// <param name="memberThunks">Thunks that return all the members upon which
    /// the hash code should depend.</param>
    /// <returns></returns>
    public static int CalculateHashCode(this object obj, params Func<object>[] memberThunks)
    {
        // Overflow is okay; just wrap around
        unchecked
        {
            int hash = 5;
            foreach (var member in memberThunks)
                hash = hash * 29 + member().GetHashCode();
            return hash;
        }
    }
}

Example usage:

public class Exhibit
{
    public virtual Document Document { get; set; }
    public virtual ExhibitType ExhibitType { get; set; }

    #region System.Object
    public override bool Equals(object obj)
    {
        return Equals(obj as Exhibit);
    }

    public bool Equals(Exhibit other)
    {
        return other != null &&
            Document.Equals(other.Document) &&
            ExhibitType.Equals(other.ExhibitType);
    }

    public override int GetHashCode()
    {
        return this.CalculateHashCode(
            () => Document, 
            () => ExhibitType);
    }
    #endregion
}
Carl G
  • 17,394
  • 14
  • 91
  • 115
1

Instead of calling keys[i].GetType().IsArray, you should try to cast it to IEnumerable (using the as keyword).

You can fix the Equals method without repeating the field list by registering a static list of fields, like I do here using a collection of delegates.
This also avoids the array allocation per-call.

Note, however, that my code doesn't handle collection properties.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964