1

Consider a class

class MyClass {
   int fieldA;
   int fieldB;

   public override bool Equals(object obj) {
       if (this == obj)
           return true;
       if (!(obj is MyClass))
           return false;

       MyClass other = obj as MyClass;
       return fieldA == other.fieldA || fieldB == other.fieldB;
  }
}

Two MyClass should be equal when fieldA is equal or fieldB is equal.

Now, I can't find a good GetHashCode() implementation that matches the Equals method I've written. Without reference to other MyClass object to compare to, getting the same hashcode looks like an impossible task, unless I set hashcode to same value and deal with hash collision.

I've written a test case that arbitrarily sets MyClass1 and MyClass2.


for (int a1 = 0; a1 \< 3; a1++) {
   for (int b1 = 0; b1 \< 3; b1++) {
      MyClass myClass1 = new MyClass { fieldA = a1, fieldB = b1 };

      for (int a2 = 0; a2 \< 3; a2++) {
         for (int b2 = 0; b2 \< 3; b2++) {
            MyClass myClass2 = new MyClass { fieldA = a2, fieldB = b2 };
            bool equals = myClass1.Equals(myClass2);
            bool hashEquals = myClass1.GetHashCode().Equals(myClass2.GetHashCode());

            if (equals != hashEquals)
               Console.WriteLine($@"Comparing ({a1}, {b1}) with ({a2}, {b2}). Equals: {equals}. HashCode: {hashEquals}");
         }
      }
   }
}

If GetHashCode is written properly, there shouldn't be any case where equals != hashEquals hence, nothing should be printed.

I've tried setting HashCode to one of the fields, but realized if another field matches, hashcode would be different.

Rand Random
  • 7,300
  • 10
  • 40
  • 88
Daniel
  • 23
  • 3
  • 2
    *there shouldn't be any case where equals != hashEquals* If you do the math you'll quickly realize this is impossible. There are only 4.29 billion possible hash codes but more than 18 quintillion possible values. Hash codes are not meant to be unique. See [this answer](https://stackoverflow.com/a/263416/2791540). – John Wu Jun 22 '23 at 19:22
  • 1
    This is impossible. It is only mathematically possible if you had a single static hash number. Consider a value of `(7, 3)` which must match `(7, 4)`. now change the `3` to a `2`. The hash must change yes? Then mathematically you cannot both change the hash which used to match `(7, 4)` but keep the equality to `(7, 4)` – maraaaaaaaa Jun 22 '23 at 19:22
  • @JohnWu When I said equals != hashEquals, I meant equals && !hashEquals. Now, this is problematic because when you implement HashSet or Dictionary with the key as MyClass, hashcode will be different, although they are equal to each other. – Daniel Jun 22 '23 at 19:28
  • 1
    If `equals` is true and `hashEquals` is false you have indeed botched your implementation of `GetHashCode()`. Are you asking us to help debug it? If so please share the code. – John Wu Jun 22 '23 at 19:32
  • 1
    So, what you are looking for is a hash code function _f(x, y)_, such that _f(x1, y1)_ is equal to _f(x2, y2)_ whenever either _x1_ == _x2_ OR _y1_ == _y2_. Did I get that right? I don't have a clue what that function would look like, but I'm hoping that this comment helps clear up some of the misunderstanding – Flydog57 Jun 22 '23 at 19:43
  • 3
    I'm almost convinced that it is indeed impossible as maraaa has said. Hashcode for (7, 3) should match (7, 4) and (5, 3), but not match (5, 4) because you only need to match one of the value to be considered equal. In other words (7, 3) == (7, 5) and (7, 5) == (3, 5), but (7, 3) != (3, 5). – Daniel Jun 22 '23 at 19:46
  • Knowing both `GetHashCode()` and `Equals()` should be linked, my `Equals()` methods are generally limited to calling and comparing `GetHashCode()` for both items. I know that's not helpful for the specific issue of the question, but I think it's a useful reminder because after you solve that problem you'll also be able to greatly simplify the existing `Equals()` code. – Joel Coehoorn Jun 22 '23 at 19:55
  • 2
    @JoelCoehoorn How do you make sure there is 1-to-1 correspondence between every possible state of your object and an int32? Do you e.g. limit the length and the allowed characters of all strings? Isn't that way harder than allowing some unequal objects to have equal hashes after all, which is how the entire thing is designed anyway? – GSerg Jun 22 '23 at 20:14
  • To meet the rules that all equal instances have an equal hashcode, then the hash must be a constant. `GetHashCode() => 7;` Is it a good hash? no. Should this type be used as a dictionary key? no. – Jeremy Lakeman Jun 23 '23 at 01:58
  • 2
    Your `Equals` method violates rule 3 here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/how-to-define-value-equality-for-a-type – Klaus Gütter Jun 23 '23 at 07:16

1 Answers1

3

Your definition of "Equal" will not work for any hash-based implementation. If A==B and B==C, then A==C must be true, otherwise you get unexpected results in lookups, groupings, and other hash-based algorithms.

In your case, (1,2) would be equal to (3,2) and (3,2) would be equal to (3,4), but (3,4) is not equal to (1,2). If you tried to "group" objects based on equality, you'd get indeterminate results depending on the order in which the objects were compared.

Even if it did work, you could not use those values in a hash algorithm, since two objects that are "equal" must have the same hash, but objects with different values of fieldA or fieldB can still be equal, so there's no way to define a hash that includes either of those fields. You would (I think) just have to return a single constant value as a hash code since any two objects could be equal regardless of the individual values of fieldA or fieldB.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Does that mean for this implementation of Equals where transitive property is violated as you said, when I have a list of MyClass, and I want to call list.Contains() method to find an element that matches the 'Equal', I can't use HashSet or Dictionary which gives me O(1) time complexity? – Daniel Jun 23 '23 at 13:20
  • Your "Equals" would work for `Contains` but not hashing - By that definition (1,2) and (3,4) would have to have the same hash - and by extension, (5,6) would have to have the same hash (since (3,4)=(5,4)=(5,6). So _every_ object would have to have the same hash, and you'll end up with an O(N) search anyway. – D Stanley Jun 23 '23 at 13:49