Is there any negative consequence in having Equals based on GetHashCode?

Question

Is the following code OK?

public override bool Equals(object obj)
{
  if (obj == null || !(obj is LicenseType))
    return false;
  return GetHashCode() == obj.GetHashCode();
}

public override int GetHashCode()
{
  return
    Vendor.GetHashCode() ^ 
    Version.GetHashCode() ^ 
    Modifiers.GetHashCode() ^ 
    Locale.GetHashCode();
}

All the properties are enums/numeric fields, and are the only properties that define the LicenseType objects.

Even if depending on the hashcode was OK, your implementation of GetHashCode is not good, you should look at http://stackoverflow.com/a/720282/267 — Lasse V. Karlsen, Mar 03 '15 at 11:01
That's not a particularly sound hash code implementation, either:http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/ — Preston Guillot, Mar 03 '15 at 11:04
GetHashCode is used as a quick check before doing a full member-by-member equality comparison. A good hashcode implementation is important because collections like Dictionary use it to create buckets. The dictionary will use a key's hashcode to locate the proper bucket then use equality to find the exact match. — Panagiotis Kanavos, Mar 03 '15 at 11:26
As written by Panagiotis, but remember that it is a little more complex... Calculating two hashcodes is normally slower than comparing member by member two objects (because in the end you have to read all the fields, and then normally you have to do some fancy math on them to calculate the hash, twice because you want two hashcodes, vs directly reading two objects and comparing their fields). The big advantage is that you can "cache" an hashcode, so that you can reuse it. Comparing cached hashcode is nearly instantaneous, because they are `int` — xanatos, Mar 03 '15 at 12:31
@xanatos I'd challenge the assertion that calculating two hashcodes is slower than a member-by-member comparison of two objects "normally", multiplication and XOR aren't fancy math, especially for a CPU. — Preston Guillot, Mar 03 '15 at 16:30
@PrestonGuillot Not too much faster, but I'll say that a comparison is faster than a sum plus a multiplication, ignoring everything else. Clearly accessing two objects at the same time will create more problems to the CPU cache. For strings >= 1000 chars it's around 8x faster, for int[10000] (using skeet's algorightm, hash * hash * 23 + value.GetHashCode()) it's around 2x (I'm testing while writing, Release + NoDebug)... In general for two different objects Equals will be even faster, because it can stop at the first difference. **Clearly for a class of 10 small fields it's probably 0s vs 0s** — xanatos, Mar 03 '15 at 18:21

score 6 · Answer 1 · answered Mar 03 '15 at 10:58

6

What happens when two different objects are returning the same HashCodes?

It is, after all, just a hash, and so may not be distinct over the full range of values the objects can have.

answered Mar 03 '15 at 10:58

Ceisc

1,278
12
18

score 6 · Accepted Answer · answered Mar 03 '15 at 10:59

No, the documentation states very clearly:

You should not assume that equal hash codes imply object equality.

Also:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality

And:

Caution:

Do not test for equality of hash codes to determine whether two objects are equal. (Unequal objects can have identical hash codes.) To test for equality, call the ReferenceEquals or Equals method.

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

It is ok (no negative consequences) only if GetHashCode is unique for each possible value. To give an example, the GetHashCode of a short (a 16 bit value) is always unique (let's hope it so :-) ), so basing the Equals to the GetHashCode is ok.

Another example, for int, the GetHashCode() is the value of the integer, so we have that ((int)value).GetHashCode() == ((int)value). Note that this isn't true for example for short (but still the hash codes of a short are unique, simply they use a more complex formula)

Note that what what Patrick wrote is wrong, because that is true for the "user" of an object/class. You are the "writer" of the object/class, so you define the concept of equality and the concept of hash code. If you define that two objects are always equal, whatever their value is, then it's ok.

public override int GetHashCode() { return 1; }
public override bool Equals(object obj) { return true; }

The only important rules for Equals are:

Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.

The Equals method is reflexive, symmetric, and transitive...

Clearly your Equals() and GetHashCode() are ok with this rules, so they are ok.

Just out of curiosity, there is at least an exception for the equality operator (==) (normally you define the equality operator based on the Equals method)

bool v1 = double.NaN.Equals(double.NaN); // true
bool v2 = double.NaN == double.NaN; // false

This because the NaN value is defined in the IEEE 754 standard as being different from all the values, NaN include. For practical reasons, the Equals returns true.

score 0 · Answer 4 · answered Mar 03 '15 at 11:07

It must be Noted that it is NOT a rule that if two objects have the same hash code, then they must be equal.

There are only four billion or so possible hash codes, but obviously there are more than four billion possible objects. There are far more than four billion ten-character strings alone. Therefore there must be at least two unequal objects that share the same hash code, by the Pigeonhole Principle.

Suppose you have a Customer object that has a bunch of fields like Name, Address, and so on. If you make two such objects with exactly the same data in two different processes, they do not have to return the same hash code. If you make such an object on Tuesday in one process, shut it down, and run the program again on Wednesday, the hash codes can be different.

This has bitten people in the past. The documentation for System.String.GetHashCode notes specifically that two identical strings can have different hash codes in different versions of the CLR, and in fact they do. Don't store string hashes in databases and expect them to be the same forever, because they won't be.

Is there any negative consequence in having Equals based on GetHashCode?

4 Answers4