12

I have a type which I am using as key in the IDictionary. The type is as following

public  class Employee
{
    public string Name { get; set; }
    public int ID { get; set; }

    public override bool Equals(object obj)
    {
        Employee emp = obj as Employee;
        if (emp != null)
            return emp.Name.Equals(this.Name);
        return false;
    }

    public override int GetHashCode()
    {
        return this.Name.GetHashCode();
    }
}

Now I have created a dictionary as following in my main as following

 IDictionary<Employee, int> empCollection = new Dictionary<Employee, int>();
        Employee emp1 = new Employee() { Name = "abhi", ID = 1 };
        Employee emp2 = new Employee() { Name = "vikram", ID = 2 };
        Employee emp3 = new Employee() { Name = "vikram", ID = 3 };

        empCollection.Add(emp1, 1);
        empCollection.Add(emp2, 2);
        empCollection.Add(emp3, 3);

Now while debugging I found out that when emp1 is added to the collection only GetHashCode method is called of the key type, after that when emp2 is added to the collection only GetHashCode method is called again but in the case of emp3 both GetHashCode and Equals methods are called.

May be it looks too naive being asking this question but why isn't Equals method not called when eqImp2 object is added to collection. What is happening inside. Please explain.

Vikram
  • 1,617
  • 5
  • 17
  • 37
  • `emp2` and `emp3` have the same hashcodes, base on your implementation of GetHashCode – Jehof Feb 20 '13 at 14:10
  • possible duplicate of [Why is it important to override GetHashCode when Equals method is overridden?](http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overridden) – Richard Feb 20 '13 at 14:12

5 Answers5

8

The dictionary and all other similar containers use the hashcode as a quick-and-dirty check: different hashcodes mean that two objects are definitely not equal; identical hashcodes do not mean anything. The documentation of GetHashCode specifies this behavior by saying

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

Your emp1 and emp2 generate different hashcodes, so the dictionary does not need to run Equals; it already knows they are not equal. On the other hand, emp2 and emp3 generate the same hashcode so the dictionary must call Equals to definitely determine if they are indeed equal, or if the identical hashcode was just the result of chance.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • Excellent answer, the only thing I am still curious about is the dictionary's reaction to a duplicate being added. I imagine it throws an exception? – Khan Feb 20 '13 at 14:20
1

In your example the GetHashCode looks at the Name hash code. emp3 has the same name as emp2 , ("vikram"). They are equal given the hash code so it further looks using Equals.

misha
  • 2,760
  • 3
  • 28
  • 30
1

emp2 and emp3 have the same key. This will cause a "key collision" in the dictionary. It first called GetHashCode() and determined the hash codes were the same. It then ensures they're the same by calling Equals(). The code from Dictionary is:

int num = this.comparer.GetHashCode(key) & 2147483647;
...
if (this.entries[i].hashCode == num && this.comparer.Equals(this.entries[i].key, key))

Obviously, if the hashcodes don't match, it never has to call Equals.

You should get a tool like ILSpy and then you can look at the code and find the answer yourself.

Pete
  • 6,585
  • 5
  • 43
  • 69
1

If you continue this experiment, you'll observe some behavior which is specific to the Dictionary<TKey, TValue> implementation, and some behavior that is required due to the way you implemented GetHashCode.

First, it's important to understand the role of GetHashCode and Equals when comparing objects for equality. Additional information is available on this question, but I'll repeat the basic rules here:

  1. The Equals method establishes exactly which objects are equal and which objects are not. All necessary checks need to be performed in this method for a final determination before returning.
  2. A hash code is a value calculated from the value of your object. Typically it is much smaller than the original object (in our case the hash code is a 4 byte integer) and not necessarily unique. However it is much faster to compute and compare to each other than the original objects themselves.
    • When hash codes do not need to be unique, different hash codes indicate different objects (i.e. Equals will definitely return false), but equal hash codes do not mean anything (i.e. Equals could return true or false).

Collections which associate values with a key object (e.g. IDictionary<TKey, TValue> in .NET, or Map<K, V> in Java) take advantage of the hash codes to improve implementation efficiency. However, since the documentation for Object.GetHashCode specifically does not require the results to be unique, these collections cannot rely on the hash codes alone for proper functionality. When two objects have the same hash code, only a call to Equals can distinguish them. The case you describe for the insertion of emp3 falls into this case: the [IDictionary<TKey, TValue>.Add] method needs to throw an ArgumentException if you are trying to insert the same value, and only a call to Equals can determine if the new key emp3 is equal to the previously inserted emp3.

Additional implementation characteristics

The particular collection implementation may result in more calls to GetHashCode than you anticipate. For example, when the internal storage of a hash table is resized, an implementation might call GetHashCode for every object stored in the collection. Collections based on a binary- or B-tree might only call GetHashCode once (if the results are cached in the tree structure), or might need to call GetHashCode for multiple objects during every insertion or lookup operation (if the results are not cached).

Sometimes hash table implementations need to call GetHashCode for multiple objects, or perhaps even Equals for objects with different hash codes due to the way they use modulo arithmetic to place keys into "buckets". The specific characteristics of this vary from one implementation to the next.

Community
  • 1
  • 1
Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
0

That is because GetHashCode is a shortcut. C# will first call GetHashCode which is supposed to be fast executing. If two objects have different HashCodes then there is no need to call the, assumingly, more expensive Equals method. Only if they have the same HashCode then it will call Equals. That is because HashCode is not guaranteed to be unique

Luis Filipe
  • 8,488
  • 7
  • 48
  • 76