why not just using GetHashCode in Equality?

Question

given the person class:

class person
{
    public string name;
    public int age;
}

say, i override the class person`s GetHashCode method:

 public override int GetHashCode()
 {
     unchecked
     {
          hashCode = 17;
          // ...some code here...
     }
     return hashCode;
 }

and based on msdn`s instruction, i also need to override the Equality, so i did this:

public override bool Equals(object obj)
{
    // ...something like: 
    return this.name == (person)obj.name && this.age ==(person)obj.age;
}

hey, wait, sine i can get the hashcode of the person instance, why not just using hashcode in Equals? like:

public override bool Equals(object obj)
{
    return this.GetHashCode() == (person)obj.GetHashCode();
}

i googled and found that most Equals() examples are similar with my previous edition of Equals(), so, am i misunderstood something?

any help will appreciated, thx.

You should read my article. Search for guidelines and rules of gethashcode. — Eric Lippert, Jun 09 '13 at 14:03
wow~ i searched and found this [link](http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx) hope this will help other guys with the same doubt :) — pinopino, Jun 10 '13 at 03:03

score 3 · Accepted Answer · answered Jun 09 '13 at 09:48

Two unequal objects are not guaranteed to have unequal hashcodes (that's called a collision).This is what MSDN says:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

It'sNotALie. · Answer 2 · 2013-06-09T14:09:25.167

1

It's because there are more possibilities than there is hashcodes.

For example, let's take your class.

Already you have a problem, as the age range is the same range as the int. That could be eliminated, of course: just use a byte instead. Still, we've got a problem: strings. A .NET string is Unicode (UTF-16), so it has 65,536 possible characters for each letter. After that, it escalates quickly... a two character string can have up to 65,536 ^ 2 characters, i.e. 4,294,967,296 (uint.MaxValue) possibilities. That's a whole lot, and that's only two characters.

td;lr: you can't guarantee that two objects that are not equal will not have the same hashcode. At all. (unless it's a byte or a short or a sbyte or a ushort, but that's a technicality)

edited Jun 09 '13 at 14:09

answered Jun 09 '13 at 09:51

It'sNotALie.

22,289
12
68
103

Your string analysis is wrong. Its really 65536 to the power of string.length. The better way to think about it is that there are four bytes in a length two string and four bytes in the hash code, and therefore there must be collisions after length two. – Eric Lippert Jun 09 '13 at 13:59
My fault for getting mixed up between UTF-32 and UTF-16. You're right. – It'sNotALie. Jun 09 '13 at 14:07

score 0 · Answer 3 · answered Jun 09 '13 at 09:56

If you want a good example, try to look on the side of Resharper.

public class Person : IEquatable<Person>
{
    public string Name { get; set; }
    public int Age { get; set; }

    public bool Equals(Person other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Name, other.Name) && Age == other.Age;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Person) obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return ((Name != null ? Name.GetHashCode() : 0) * 397) ^ Age;
        }
    }
}

score 0 · Answer 4 · answered Jun 10 '13 at 21:39

The idea behind GetHashCode is that if one knows that two objects have different hash codes, one can safely assume that they aren't equal without having to look at them. Only if two object's hash codes match will it be necessary to examine them further. If one has a collection of objects, none of whose hash codes can possibly match a given object (e.g. because all the objects in the collection have hash codes ending in 4591 and the given object's hash code ends in 2011), one need not examine any of the objects in the collection to know that none of them can possibly match the given object.

Properly-written code which discovers an object whose hash code matches that of a given object should figure that the objects are likely to match, but might not, and should scan the objects in detail to find out if they actually do. If the hash codes match but the objects don't, the only consequence of the hash-code match should be an increase in the amount of time required to discover that the objects were different. If one in a million comparisons generate a false match, pre-checking hash codes could reduce the number of detailed comparisons by a factor of a million. By contrast, if the hash function isn't nearly as good, and one in a thousand comparisons yields a false match, pre-checking the hash codes will "only" reduce the number of detailed comparisons by a factor of a thousand. Of course, even though a thousand-fold speedup isn't as good as a million-fold speed up, it may still be vastly better than no speedup.

why not just using GetHashCode in Equality?

4 Answers4

Linked