Trying to understand the GetHashCode()

Question

I found the following on Microsoft documentation:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash code

I made my own tests to understand the Method:

public static void HashMetod() 
{
    List<Cliente> listClientTest = new List<Cliente>
    {
        new Cliente { ID = 1, name = "Marcos", Phones = "2222"}
    };

    List<Empresa> CompanyList = new List<Empresa>
    {
        new Empresa { ID = 1, name = "NovaQuimica", Clients = listClientTest },
        new Empresa { ID = 1, name = "NovaQuimica", Clients = listClientTest }
    };

    CompanyList.Add(CompanyList[0]);

    foreach (var item in CompanyList)
    {
        Console.WriteLine("Hash code = {0}", item.GetHashCode());
    }

    Console.WriteLine("CompanyList[0].Equals(CompanyList[1]) = {0}", CompanyList[0].Equals(CompanyList[1]));
    Console.WriteLine("CompanyList[0].Equals(CompanyList[2]) = {0}", CompanyList[0].Equals(CompanyList[2]));
}

My Question is: How can two Differents objects returns the same HashCode? I believe that if two objects return the same, they are Equals(Thats what my method shows). Execute my method and check this out.

there are examples on this current site that also show as well as explain Equality and Inequality see that section that says `Related` try reading some of the related links.. here is one for you http://stackoverflow.com/questions/17008365/why-not-just-using-gethashcode-in-equality?rq=1 — MethodMan, Aug 16 '13 at 13:05
GetHashCode() returns an Int32, which has a maximum of 4,294,967,295 distinct values. You might have more objects around (if you have an impressive amount of memory and your program runs in 64bit). — Mr47, Aug 16 '13 at 13:07
Hash code is 32 bits. A long is 64 bits. Ask yourself this: Can you have a different 32 bit hash code for each possible 64-bit long? — Matthew Watson, Aug 16 '13 at 13:12

SWeko · Accepted Answer · 2013-10-03T09:37:38.707

A simple observation based on the pigeonhole principle:

GetHashCode returns an int - a 32 bit integer.
There are 4.294.967.296 32-bit integers;
Considering only uppercase English letters, there are 141.167.095.653.376 ten letter words. If we include upper- and lowercase, then we have 144.555.105.949.057.024 combinations.
Since there are more objects than available hash-codes, some (different) objects must have the same hash code.

Another, more real-world example, is that if you wanted to give each person on Earth a hashcode, you would have collisions, since we have more persons than 32-bit integers.

"Fun" fact: because of the birthday paradox, in a city of 100.000 people, you have more than 50% chance of a hash collision.

JNL · Answer 2 · 2013-08-16T13:52:51.600

Here is an Example;

String s1 = new String("AMY");
String s2 = new String("MAY");

Two different Objects, but if the hashCode is calculated with say, the ASCII Code of the characters, it will be the same for MAY and AMY.

You should basically understand the concept of hashing for this.

hashing an object means "finding a value (number) that can be reproduced by the very same instance again and again".

Because hash codes from Object.hashCode() are of type int, you can only have 2^32 different values. That's why you will have so-called "collisions" depending on the hashing algorithm, when two distinct Objects produce the same hashCode.

To understand them better, you can go through a series of good examples;

Hope this helps.

Good answer regardless of the platform, however it's about C# not Java. — BartoszKP, Aug 16 '13 at 13:49
@BartoszKP Thanks. However, the data types range for both the platforms are the same. I just gave a reference of Java here. — JNL, Aug 16 '13 at 13:52

Dmitry Bychenko · Answer 3 · 2013-08-16T13:23:59.373

Hash code is int, that has 2^32 diffent values. Now let's take String class - it can have infinitly many different values, so we can conclude that there must be the same hash codes for different String values.

To find out hash collisions you may exploit Birthday paradox. For instance, for Doubles it could be

  random gen = new Random();

  Dictionary<int, Double> dict = new Dictionary<int, Double>();

  // In general it'll take about 
  // 2 * sqrt(2^32) = 2 * 65536 = 131072 = 1e5 itterations
  // to find out a hash collision (two unequal values with the same hash)  
  while (true) {
    Double d = gen.NextDouble();
    int key = d.GetHashCode();

    if (dict.ContainsKey(key)) {
      Console.Write(d.ToString(Culture.InvariantCulture));
      Console.Write(".GetHashCode() == ");

      Console.Write(dict[key].ToString(Culture.InvariantCulture));
      Console.Write(".GetHashCode() == ");
      Console.Write(key.ToString(Culture.InvariantCulture));

      break;
    }

    dict.Add(key, d);
   }

In my case

  0.540086061479564.GetHashCode() == 0.0337553788133689.GetHashCode() == -1350313817

score 1 · Answer 4 · answered Aug 16 '13 at 13:07

You can read about hashing on the wiki page. But the whole point of hashing is to convert a value into an index, which is done with a hashing function. Hashing functions can vary, but pretty much all end with a mod to constrain the index value within a maximum so it can be put in an array. For each mod n there are an infinite amount of numbers that will yield the same index (I.E. 5 mod 2, 7 mod 2, etc).

score 1 · Answer 5 · answered Aug 16 '13 at 13:08

You probably just need to read up on Hash Functions in general to make sure you understand that. From Wikipedia:

Hash functions are primarily used to generate fixed-length output data that acts as a shortened reference to the original data

So essentially you know that you are taking a large (potentially infinite) set of possibilities and trying to fit them into a smaller, more manageable set of possibilities. Because of the two different sizes of the sets, you're guaranteed to have collisions between two different source objects and their Hashes. That said, a good Hash function minimizes those collisions as much as possible.

score 0 · Answer 6 · answered Aug 16 '13 at 20:44

The purpose of a hash code is to allow code which receives an object to quickly identify things that an object cannot possibly be equal to. If a collection class which has been asked to store many objects it knows nothing about other than how to test them for equality, were then given another object and were asked whether it matches any of the objects it has stored, the collection would have to call Equals on every object in the collection. On the other hand, if the collection can call GetHashCode on each item that's added to the collection, as well as the item it's looking for, and if 99% of the objects in the collection have reported a hashcode which doesn't match the hashcode of the item being sought, then only the 1% of objects whose hashcode does match need to be examined.

The fact that two items' hash codes match won't help compare the two items any faster than could have been done without checking their hash codes, but the fact that items' hash codes don't match will eliminate any need to examine them further. In scenarios were items are far more likely not to match than they are to match, hash codes make it possible to accelerate the non-match case, sometimes by many orders of magnitude.

Trying to understand the GetHashCode()

6 Answers6