4

I have a question about how the Dictionary and HashSet work in C#. According to my understanding, GetHashCode is used in hash tables to determine key uniqueness.

On the following MSDN page, it states:

A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary class, the Hashtable class, or a type derived from the DictionaryBase class.

Link: MSDN Object.GetHashCode

If that is the case, why does ContainsKey and Contains return false for car2 when it has the same hashcode as car1? If my understanding is correct and if what MSDN says is correct, shouldn't both of those return true?

class Program
{
    static void Main(string[] args)
    {            
        // Create a Dictionary and HashSet
        Dictionary<Car, int> carDictionary = new Dictionary<Car, int>();
        HashSet<Car> carSet = new HashSet<Car>();

        // Create 3 Cars (2 generic and 1 Civic)
        Car car1 = new Car();
        Car car2 = new Car();
        Car car3 = new Civic();

        // Test hash values
        int test1 = car1.GetHashCode(); // 22008501
        int test2 = car2.GetHashCode(); // 22008501
        int test3 = car3.GetHashCode(); // 12048305


        // Add 1 generic car and 1 Civic to both Dictionary and HashSet
        carDictionary.Add(car1, 1);
        carDictionary.Add(car3, 1);
        carSet.Add(car1);
        carSet.Add(car3);

        // Why are both of these false?
        bool dictTest1 = carDictionary.ContainsKey(car2);  // false
        bool setTest1 = carSet.Contains(car2); // false

        // Testing equality makes sense
        bool testA = car1.Equals(car2); // false
        bool testB = car1.Equals(car3); // false
    }
}

class Car
{
    public override int GetHashCode()
    {
        return 22008501;
    }
}

class Civic : Car
{
    public override int GetHashCode()
    {
        return 12048305;
    }
}
saegeoff
  • 551
  • 5
  • 12
  • 1
    Is this related to a hash collision? – saegeoff Feb 18 '15 at 20:58
  • 1
    They must have the same hash code and be equal. You should also override `object.Equals`. – Mephy Feb 18 '15 at 21:00
  • possible duplicate of [Why is it important to override GetHashCode when Equals method is overridden?](http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overridden) – Matt Smith Feb 18 '15 at 21:02
  • You might be interested to see [this](http://stackoverflow.com/a/8952026/1364007) answer by Jon Skeet. – Wai Ha Lee Feb 18 '15 at 21:03
  • This can answer your question: http://stackoverflow.com/questions/6493605/how-does-a-hashmap-work-in-java – Filipe Borges Feb 18 '15 at 21:06
  • The MSDN page that you yourself linked to covers all of this already. `Derived classes that override GetHashCode must also override Equals to guarantee that two objects considered equal have the same hash code; otherwise, the Hashtable type might not work correctly.` – Servy Feb 18 '15 at 21:12

3 Answers3

10

Because the logic of ContainsKey is similar to this.

//This is a simplified model for answering the OP's question, the real one is more complex.
private List<List<KeyValuePair<TKey,TValue>>> _buckets = //....

public bool ContainsKey(TKey key)
{
    List<KeyValuePair<TKey,TValue>> bucket = _buckets[key.GetHashCode() % _buckets.Length];
    foreach(var item in bucket)
    {
        if(key.Equals(item.Key))
            return true;
    }
    return false;
}

All GetHashCode does is get the bucket your key would go in, it still must go through each member of that bucket and find the exact match via the Equals method. That is why having good hash codes is important, the less items in a bucket the faster the foreach part will be. The best possible hashcode will have only one item per bucket.


Here is the actual code for Contains on a HashSet (Dictionary's ContainsKey is very similar but more complex)

private int[] m_buckets;
private Slot[] m_slots;

public bool Contains(T item) {
    if (m_buckets != null) {
        int hashCode = InternalGetHashCode(item);
        // see note at "HashSet" level describing why "- 1" appears in for loop
        for (int i = m_buckets[hashCode % m_buckets.Length] - 1; i >= 0; i = m_slots[i].next) {
            if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, item)) {
                return true;
            }
        }
    }
    // either m_buckets is null or wasn't found
    return false;
}

private int InternalGetHashCode(T item) {
    if (item == null) {
        return 0;
    } 
    return m_comparer.GetHashCode(item) & Lower31BitMask;
}

internal struct Slot {
    internal int hashCode;      // Lower 31 bits of hash code, -1 if unused
    internal T value;
    internal int next;          // Index of next entry, -1 if last
}
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • Thanks, that clears it up for me. I see how. I understood the equality method but I didn't understand how the Dictionary and HashSet implemented the Contains/ContainsKey methods. – saegeoff Feb 18 '15 at 21:06
  • N.b. the proper implementation is available in the Reference Source page for [HashSet](http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs). – Wai Ha Lee Feb 18 '15 at 21:10
3

The hashcodes don't have to be guaranteed to be unique, they must be equal if the keys are equal.

Now what happens is that the items are stored in buckets. If you query whether a Dictionary<TK,TV> contains a given key or the HashSet<T> a given item, it will first calculate the hashcode to fetch the correct bucket.

Next it will iterate over all items in the bucket and perform .Equals tests on it. Only in case one of these matches, it will return true.

In other words, one is allowed to return the same hashcode for every instance although the instances are different. It only makes the hashing inefficient.

C# thus stores a Dictionary<TK,TV> like:

+----------+
| 22008501 |---<car1,1>----<car3,1>----|
+----------+
| 11155414 | (other bucket)
+----------+

With on the left side (possible bucket labels), although for small Dictionary's, the number of buckets will be very small, and an operations will be performed on the hash (for instance a modulo), to make the number of outcomes smaller.

Now if you query whether car2 is in the Dictionary, it will calculate the hash, and thus take the first bucket. Then it will iterate, and perform an equality check on car1 vs car2, next car3 vs car2 and it will reach the end of the bucket and return false. This is because the default Equals operation is reference equality. Only if you override that too, (for instance all cars are the same, you can return true).

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

As you noticed, car1.Equals(car2) isn't true. Dictionary and Hashset membership will only be true for objects that are equal. That means .Equals() returns true. This is only tested if their hashcodes are first found to be equal.

recursive
  • 83,943
  • 34
  • 151
  • 241