Using GetHashCode in == operator override

Question

Just wondering if what I mentioned in the title is a good practice. It makes sense to me, we're overriding GetHashCode to return a value based on two properties that if match, the two objects should be treated as equal. Logic seems fine and the code works, but I don't know if it can cause other problems.

This is using GetHashCode:

public static bool operator ==(CartesianCoordinates a, CartesianCoordinates b)
{
    return a.GetHashCode() == b.GetHashCode(); // Using GetHashCode here
}

public static bool operator !=(CartesianCoordinates a, CartesianCoordinates b)
{
    return !(a == b);
}

public override bool Equals(object obj)
{
    return this == (CartesianCoordinates)obj; // This just uses the == override
}

public override int GetHashCode()
{
    return (this.X + this.Y.ToLower()).GetHashCode(); // GetHashCode hashes the two properties we care about
}

And this is how I had it before:

public static bool operator ==(CartesianCoordinates a, CartesianCoordinates b)
{
    return a.X == b.X && string.Equals(a.Y, b.Y, StringComparison.CurrentCultureIgnoreCase); // The difference is here
}

public static bool operator !=(CartesianCoordinates a, CartesianCoordinates b)
{
    return !(a == b);
}

public override bool Equals(object obj)
{
    return this == (CartesianCoordinates)obj;
}

public override int GetHashCode()
{
    return (this.X + this.Y.ToLower()).GetHashCode();
}

Important Note:

In the CartesianCoordinates object, X is an int and Y is a string:

public int X { get; set; }
public string Y { get; set; }

Lmk, thanks in advance!

Sorry, I should've mentioned that X is a number and Y is a letter. Updating question. — Carlo, Jan 26 '12 at 00:15
Do you think that 4 bytes are enough to uniquely describe an object of arbitrary size? — vgru, Jan 26 '12 at 00:17
@SLaks you're right I didn't think of that because under the design of our project that's impossible. Y is always a letter. — Carlo, Jan 26 '12 at 00:18
@Groo No the string is not representing a number, it's representing a letter as in 2,"B" or 3,"C"; not 2,"2". — Carlo, Jan 26 '12 at 00:40

Thomas Levesque · Accepted Answer · 2012-01-26T00:20:18.827

8

Doing this is not only bad practice, it's just wrong! Two objects that are equal must have the same hashcode, but the opposite is not true: two different objects can have the same hashcode (and often will). So if you use the hashcode to decide whether or not the objects are equal, in some case you will consider them equal when they're actually different but just happen to have the same hashcode. A hashcode is not a unique identifier...

Based on your GetHashCode implementation, objects with coordinates (x, y) and (y, x) will be considered equal (since x + y == y + x)

edited Jan 26 '12 at 00:20

answered Jan 26 '12 at 00:12

Thomas Levesque

286,951
70
623
758

I just updated the question. X is an int, Y is a string (it's always a letter), so x + y = 1A and y + x = A1 – Carlo Jan 26 '12 at 00:21
@Carlo, OK, but it doesn't change the fact that "1A" and "A1" could have the same hashcode. There is an infinite number of different strings, and "only" 2^32 different hashcodes... – Thomas Levesque Jan 26 '12 at 00:24
I need to research into that, I always thought 2 different strings, such as "1A" and "A1" would always produce a different hash code. But anyway, it's clear now that it's not a good practice. Thanks! – Carlo Jan 26 '12 at 00:28
I have another question, might be a whole different topic, but based on what you said "two different objects can have the same hashcode"; how is a HashSet safe to use if two or more objects can have the same hash code? I'm sort of new to this stuff so I'm sure I'm missing a lot of stuff. But that's my reasoning behind the way I'm using GetHashCode. If I ever want to store coordinates in a HashSet, I want them to be uniquely stored by coordinates. Thanks in advance. – Carlo Jan 26 '12 at 00:37
2

@Carlo, a HashSet (and other hashtable-like data structures) have a different "bucket" for each different hashcode. All items with the same hashcode end up in the same bucket. When you test the presence of an item in a hashset, it finds the right bucket based on the hashcode, then it performs a linear search on the bucket (using Equals) to find the item you're looking for. That's why having a good distribution of hashcodes is important for performance, to avoid having too many items in the same bucket. – Thomas Levesque Jan 26 '12 at 00:44
So getting a hash code from a string and an int are not enough for a good distribution of hash codes? Since it's highly probable that they will be repeated? – Carlo Jan 26 '12 at 00:46
@Carlo, the `GetHashCode` implementation for `String` is designed to be very efficient; even very similar strings will have completely different hashcodes, so it shouldn't be an issue. See [this question](http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode) for an efficient implementation of GetHashCode based on several fields. – Thomas Levesque Jan 26 '12 at 00:53
Cool, thanks for all the info man, I sure learned a lot from this today =). – Carlo Jan 26 '12 at 00:58
+1. I'll add, that if we had a class with less than 4294967296 values and gave it a nice clever hash code that was unique for all possible values (quite possible, and indeed happens with real classes with limited possible states), then since the hash code is predicated upon the conditions for equality, rather than the other way around, it's not going to save us anything to use it in the equality test, even for such a case as that. Any hashcode good enough to give sufficient spread would be at least as expensive as the equality test anyway. – Jon Hanna Jan 26 '12 at 01:23
1

@Carlo: You probably should also read this: http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx – Eric Lippert Jan 26 '12 at 02:14
@EricLippert Will definitely do! Thanks! – Carlo Jan 26 '12 at 05:30

score 4 · Answer 2 · answered Jan 26 '12 at 00:11

4

That is very wrong.
GetHashCode() is not unique.

Your particular example is even worse, since your GetHashCode() is commutative.

answered Jan 26 '12 at 00:11

SLaks

868,454
176
1,908
1,964

Can you elaborate a bit more? – Carlo Jan 26 '12 at 00:12
2

`GetHashCode()` is not guaranteed to return different hashes for different objects (which makes sense, since there are 2^32 different hashes and an unlimited number of different objects) – SLaks Jan 26 '12 at 00:13

score 3 · Answer 3 · answered Jan 26 '12 at 00:13

GetHashCode() can return same value for two CartesianCoordinates that are not equal, e.g. for two objects c1 and c2 such that c1.x == c2.y and c1.y == c2.x.

For more complex objects and if the hashCode is pre-computed it can be used to short-circuit but eventually you will need to compare all fields. Here the comparison is trivial

Tetsujin no Oni · Answer 4 · 2012-01-26T00:31:01.103

0

Your GetHashCode will return the same value for {3,1} and {1,3}. I suspect this is not intended.

If your GetHashCode method returned a better hashcode, I'd still recommend against using GetHashCode as an equality comparison. The GetHashCode contract says that two objects with the same tracked properties should generate the same hashcode, not that two objects with equal hashcodes will be equal.

Edit to add:

See this question for details on what a better implementation looks like, particularly the Josh Bloch / Jon Skeet answer.

edited Jan 26 '12 at 00:31

answered Jan 26 '12 at 00:14

Tetsujin no Oni

7,300
2
29
46

I updated question, X is an int and Y is a string, so it should hash 3A or A3 instead of 3 + 1 or 1 + 3, which would always be 4. – Carlo Jan 26 '12 at 00:17

score 0 · Answer 5 · answered Jan 26 '12 at 00:16

No, it's not good practice (or even correct) to define equality using the hash code since different objects may have the same hash code (and in the case of String.GetHashCode that can definitely happen).

In your particular case it should also be noted that your GetHashCode method doesn't even match your Equals method because your GetHashCode will return different hash codes for strings that are equal except for case, while your Equals will return true in that case.

score 0 · Answer 6 · answered Jan 26 '12 at 00:17

Your GetHashCode() will give the same result for (2,4) and (4,2). Probably not what you want.

GetHashCode() is also generally not a good test for equality. Even a really good hash is only a 32-bit integer. To use that to test for equality in the cartesian plane will result in many (an infinite number) of collisions.

Using GetHashCode in == operator override

6 Answers6