How can I identify a bad implementation of GetHashCode?

Question

I have an implementation of GetHashCode which I believe to be fairly robust, but, honestly, I dredged it up from the depths of the internet and, while I understand what is written, I don't feel qualified to describe it as a 'good' or a 'bad' implementation of GetHashCode.

I've done a lot of reading on StackOverflow about GetHashCode. Is there a sample why Equals/GetHashCode should be overwritten in NHibernate? I think this thread is probably the best source of information, but it still left me wondering.

Consider the following entity and its given implementation of Equals and GetHashCode:

public class Playlist : IAbstractDomainEntity
{
    public Guid Id { get; set; }
    public string Title { get; set; 
    public Stream Stream { get; set; }
    //  Use interfaces so NHibernate can inject with its own collection implementation.
    public IList<PlaylistItem> Items { get; set; }
    public PlaylistItem FirstItem { get; set; }
    public Playlist NextPlaylist { get; set; }
    public Playlist PreviousPlaylist { get; set; }

    private int? _oldHashCode;
    public override int GetHashCode()
    {
        // Once we have a hash code we'll never change it
        if (_oldHashCode.HasValue)
            return _oldHashCode.Value;

        bool thisIsTransient = Equals(Id, Guid.Empty);

        // When this instance is transient, we use the base GetHashCode()
        // and remember it, so an instance can NEVER change its hash code.
        if (thisIsTransient)
        {
            _oldHashCode = base.GetHashCode();
            return _oldHashCode.Value;
        }
        return Id.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        Playlist other = obj as Playlist;
        if (other == null)
            return false;

        // handle the case of comparing two NEW objects
        bool otherIsTransient = Equals(other.Id, Guid.Empty);
        bool thisIsTransient = Equals(Id, Guid.Empty);
        if (otherIsTransient && thisIsTransient)
            return ReferenceEquals(other, this);

        return other.Id.Equals(Id);
    }
}

The amount of safety-checking touted in this implementation seems over the top. It inspires confidence in me -- assuming that the person who wrote this understood more corner cases than I -- but also makes me wonder why I see so many simplistic implementations.

Why is it important to override GetHashCode when Equals method is overridden? Look at all of these different implementations. Below is a simple, yet highly rated, implementation:

  public override int GetHashCode()
  {
        return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
  }

Would this implementation be better or worse than the one I've provided? Why?

Are both equally valid? Is there a standard 'guideline' which should be followed when implementing GetHashCode? Are there any obvious flaws with the above implementation? How can one create test cases to validate ones implementation of GetHashCode?

I always use Jon Skeet's advice in the accepted answer here: http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode — Matt Davis, Jul 08 '13 at 19:10
Semi-ironically, Jon Skeet's implementation was my first implementation. While I am fully aboard the 'Jon Skeet is amazing' train -- I still don't understand why one implementation is better than another. His implementation relies on there being at least 3 properties of the entity to check on... but my implementation does not have such a restriction. Does that result in different effects? If not, why does his rely on so many properties? — Sean Anderson, Jul 08 '13 at 19:14
You're already in trouble because you're effectively using a mutable value for your hashing. This is bad if the item is modified while stored in any container that uses hashes. You'll end up with a corrupt container. — spender, Jul 08 '13 at 19:16
How is my value mutable? If oldHashCode is set, it is returned. Otherwise it is set and then returned.. yeah? — Sean Anderson, Jul 08 '13 at 19:17
@SeanAnderson - Jon's implementation is honoring a major design rule for GetHashCode: Equal items have equal hash codes. This might help: http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx — hatchet - done with SOverflow, Jul 08 '13 at 19:18
Your Id property is mutable. I could store it in a hashset, change its Id, then store a "duplicate" entry without complaint. This violates design guidelines. — spender, Jul 08 '13 at 19:19
@Spender Would Jon Skeet's implementation have the same pitfall if "field1" was the Id field? — Sean Anderson, Jul 08 '13 at 19:19
@SeanAnderson: Yes, it would. The real question is, why are you implementing `GetHashCode` and `Equals` on a mutable item? — StriplingWarrior, Jul 08 '13 at 19:21
I am attempting to satisfy NHibernate's need to differ a detached entity from one stored in NHibernate's cache. http://stackoverflow.com/questions/5851398/nhibernate-reasons-for-overriding-equals-and-gethashcode — Sean Anderson, Jul 08 '13 at 19:22
The implementation that Jon advocates is not predicated on a certain number of properties; it's just an example. Remember, GetHashCode() is used to allow instances of your class to be used as keys in collections such as dictionaries or hashtables. It follows that a robust implementation would take into account those fields that vary from instance to instance, which is typically all the fields in a class. In his example, that's 3 fields; in yours, it looks like 1. — Matt Davis, Jul 08 '13 at 19:24
@SeanAnderson: So you're expecting that someone might store two different objects in a single NHibernate collection that come from different NHibernate sessions, but where the objects represent the same object in persistent storage? — StriplingWarrior, Jul 08 '13 at 19:26
@StriplingWarrior I will have two references to a Playlist with the same ID when I de-serialize the Playlist's JSON representation (assuming NHibernate stored the Playlist in its cache). My understanding is that I need to override Equals and GetHashCode to provide NHibernate with the ability to decide if these two entities are equal. However, even if I am misunderstanding the scenario, I have heard it is best practice to implement them 'just in case.' — Sean Anderson, Jul 08 '13 at 19:30
@SeanAnderson: So what happens if you add one of these JSON-deserialized Playlist entities to HHibernate, and it's different than the one NHibernate had before? Would the last item added to the collection "win", so to speak? — StriplingWarrior, Jul 08 '13 at 19:48
@StriplingWarrior You know.. I am really not sure. I suppose it would probably store two copies in Session until a Commit was called and if both still existed an exception would be thrown if constraints were violated. — Sean Anderson, Jul 08 '13 at 22:02
@SeanAnderson: In that case, you could probably get your desired behavior by not overriding the GetHashCode and Equals methods at all. This is why it's so tricky to provide a general-purpose answer to questions about Hashcode and Equals: it largely depends on how the objects will be consumed. — StriplingWarrior, Jul 09 '13 at 17:27
That is exactly what I am finding/reading/learning. I appreciate SO working through the learning process with me, however! :) — Sean Anderson, Jul 09 '13 at 17:44

score 2 · Accepted Answer · answered Jul 08 '13 at 19:59

GetHashCode should match concept of "equal" for your classes/environment (in addition to be constant while in a container and fast).

In normal cases "equal" is comparing all fields of corresponding objects (value type comparison). In this case simple implementation that somehow merges hash codes of all fields will suffice.

My understanding that in NHibernate's case "equal" is significantly more tricky and as result you see complicated implementation.I believe it is mainly due to the fact that some object properties may not be yet available - in such case comparing "identity" of object is enough.

score 1 · Answer 2 · answered Jul 08 '13 at 22:07

It's unfortunate that while there two different questions that an equality-test method could meaningfully ask of any pair of object references X and Y, there's only one Equals method and one GetHashCode method.

Assuming X and Y are of the same type(*), will all members of X always behave the same as corresponding methods of Y? Two references to different arrays would be reported as unequal under this definition even if they contain matching elements, since even if their elements are the same at one moment in time, that may not always be true.
Assuming X and Y are of the same type(*), would simultaneously replacing all references to object X with references to object Y, and vice versa, affect any members of either other than an identity-based GetHashCode function? References to two distinct arrays whose elements match would be reported as equal under this definition.

(*) In general, objects of different types should report unequal. There might be some cases where one could argue that objects of different private classes which are inherited from the same public class should be considered equal, if all of the code which has access to the private classes only stores reference in the matching public type, but that would be at most a pretty narrow exception.

Some situations require asking the first question, and some require asking the second; the default Object implementation of Equals and GetHashCode answer the first, while the default ValueType implementations answer the second. Unfortunately, the selection of which comparison method is appropriate for a given reference is a function of how the reference is used, rather than a function of the referred-to instance's type. If two objects hold references to collections which they will neither mutate nor expose to code that might do so, for the intention of encapsulating the contents thereof, equality of the objects holding those references should depend upon the contents of the collections, rather than their identity.

It looks as though code is sometimes using instances of type PlayList in ways where the first question is more appropriate, and sometimes in ways where the second would be more appropriate. While that may be workable, I think it might be better to have a common data-holder object which can be wrapped if necessary by an object whose equality-check method would be suitable for one use or the other (e.g. have a PlaylistData object which can be wrapped by either a MutablePlaylist or an ImmutablePlaylist). The wrapper classes could have InvalidateAndMakeImmutable or InvalidateAndMakeMutable methods which would invalidate the wrapper and return a new wrapper around the object (using wrappers would ensure that the system would know whether a given Playlist reference could be exposed to code that might mutate it).

How can I identify a bad implementation of GetHashCode?

2 Answers2