26

I'm having some difficulty using Linq's .Except() method when comparing two collections of a custom object.

I've derived my class from Object and implemented overrides for Equals(), GetHashCode(), and the operators == and !=. I've also created a CompareTo() method.

In my two collections, as a debugging experiment, I took the first item from each list (which is a duplicate) and compared them as follows:

itemListA[0].Equals(itemListB[0]);     // true
itemListA[0] == itemListB[0];          // true
itemListA[0].CompareTo(itemListB[0]);  // 0

In all three cases, the result is as I wanted. However, when I use Linq's Except() method, the duplicate items are not removed:

List<myObject> newList = itemListA.Except(itemListB).ToList();

Learning about how Linq does comparisons, I've discovered various (conflicting?) methods that say I need to inherit from IEquatable<T> or IEqualityComparer<T> etc.

I'm confused because when I inherit from, for example, IEquatable<T>, I am required to provide a new Equals() method with a different signature from what I've already overridden. Do I need to have two such methods with different signatures, or should I no longer derive my class from Object?

My object definition (simplified) looks like this:

public class MyObject : Object
{
    public string Name {get; set;}
    public DateTime LastUpdate {get; set;}

    public int CompareTo(MyObject other)
    {
        // ...
    }

    public override bool Equals(object obj)
    {
        // allows some tolerance on LastUpdate
    }

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 17;
            hash = hash * 23 + Name.GetHashCode();
            hash = hash * 23 + LastUpdate.GetHashCode();
            return hash;
        }
    }

    // Overrides for operators
}

I noticed that when I inherit from IEquatable<T> I can do so using IEquatable<MyObject> or IEquatable<object>; the requirements for the Equals() signature change when I use one or the other. What is the recommended way?

What I am trying to accomplish:

I want to be able to use Linq (Distinct/Except) as well as the standard equality operators (== and !=) without duplicating code. The comparison should allow two objects to be considered equal if their name is identical and the LastUpdate property is within a number of seconds (user-specified) tolerance.

Edit:

Showing GetHashCode() code.

JYelton
  • 35,664
  • 27
  • 132
  • 191
  • 1
    Isn't a class automatically derived from `Object`? You also need `IComparable(T)` for the `CompareTo` method. – Dustin Kingen Apr 03 '13 at 20:50
  • 2
    I suspect the problem is a flaw in your `GetHashCode()` logic. I don't think you're required to implement anything specific in order for this to work. Can you post it? – Bobson Apr 03 '13 at 20:51
  • 2
    The "allow some tolerance" sounds like you're likely to break transitivity, too. – Jon Skeet Apr 03 '13 at 20:51
  • @Jon This is being used to compare items on two different computers (the class is serialized and transmitted). Items that are within a few seconds of each other should be considered the same, to allow for slight clock differences. Perhaps another method would be better altogether? – JYelton Apr 03 '13 at 20:53
  • @Bobson Sure, I will update the question. – JYelton Apr 03 '13 at 20:56
  • 3
    @JYelton: As I say, that breaks transitivity, where x.equals(y) and y.equals(z) implies that x.equals(z). x and y might be out by 3 seconds, and y and z might be out by 3 seconds... so x and z might be out by 6 seconds. If your tolerance is 5 seconds, bang goes transitivity. – Jon Skeet Apr 03 '13 at 21:01
  • @Jon Perhaps I will just need to move the time allowance from the equality comparisons, and do the comparison in a loop, looking at the properties individually. I may have been trying to simplify code in the wrong place, and got myself in a bind. – JYelton Apr 03 '13 at 21:03

3 Answers3

30

It doesn't matter whether you override object.Equals and object.GetHashCode, implement IEquatable, or provide an IEqualityComparer. All of them can work, just in slightly different ways.

1) Overriding Equals and GetHashCode from object:

This is the base case, in a sense. It will generally work, assuming you're in a position to edit the type to ensure that the implementation of the two methods are as desired. There's nothing wrong with doing just this in many cases.

2) Implementing IEquatable

The key point here is that you can (and should) implement IEquatable<YourTypeHere>. The key difference between this and #1 is that you have strong typing for the Equals method, rather than just having it use object. This is both better for convenience to the programmer (added type safety) and also means that any value types won't be boxed, so this can improve performance for custom structs. If you do this you should pretty much always do it in addition to #1, not instead of. Having the Equals method here differ in functionality from object.Equals would be...bad. Don't do that.

3) Implementing IEqualityComparer

This is entirely different from the first two. The idea here is that the object isn't getting it's own hash code, or seeing if it's equal to something else. The point of this approach is that an object doesn't know how to properly get it's hash or see if it's equal to something else. Perhaps it's because you don't control the code of the type (i.e. a 3rd party library) and they didn't bother to override the behavior, or perhaps they did override it but you just want your own unique definition of "equality" in this particular context.

In this case you create an entirely separate "comparer" object that takes in two different objects and informs you of whether they are equal or not, or what the hash code of one object is. When using this solution it doesn't matter what the Equals or GetHashCode methods do in the type itself, you won't use it.


Note that all of this is entirely unrelated from the == operator, which is its own beast.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • This helps a *lot* in understanding the purpose behind these different interfaces. Thank you. – JYelton Apr 03 '13 at 21:15
  • 3
    It's very impotant to only do #2 in addition to #1. Implementing `IEquatable<>` without overriding `GetHashCode()` does not give a compiler warning (sadly), but it leads to a type which malfunctions in `Dictionary`, `HashSet`, and many other situations (like the `Except` method from the original question above, or `Distinct` method) where the `EqualityComparer.Default` comparer is implicitly (or explicitly of course) used. – Jeppe Stig Nielsen Apr 03 '13 at 21:33
  • @JeppeStigNielsen @Servy Why should I override `Equals` when I implement `IEquatable`? (I override `GetHashCode()` of course) – Alexander Derck Feb 17 '16 at 10:36
  • 1
    @AlexanderDerck What if someone uses a type, such as `System.Collections.Hashtable`, which uses `Equals(object)` and `GetHashCode()` directly, without knowing or caring about the `IEquatable<>` type? – Jeppe Stig Nielsen Feb 17 '16 at 13:51
  • @JeppeStigNielsen True, found [this good blog](http://blogs.msdn.com/b/jaredpar/archive/2009/01/15/if-you-implement-iequatable-t-you-still-must-override-object-s-equals-and-gethashcode.aspx) about it too. – Alexander Derck Feb 17 '16 at 14:11
6

The basic pattern I use for equality in an object is the following. Note that only 2 methods have actual logic specific to the object. The rest is just boiler plate code that feeds into these 2 methods

class MyObject : IEquatable<MyObject> { 
  public bool Equals(MyObject other) { 
    if (Object.ReferenceEquals(other, null)) {
      return false;
    }

    // Actual equality logic here
  }

  public override int GetHashCode() { 
    // Actual Hashcode logic here
  }

  public override bool Equals(Object obj) {
    return Equals(obj as MyObject);
  }

  public static bool operator==(MyObject left, MyObject right) { 
    if (Object.ReferenceEquals(left, null)) {
      return Object.ReferenceEquals(right, null);
    }
    return left.Equals(right);
  }

  public static bool operator!=(MyObject left, MyObject right) {
    return !(left == right);
  }
}

If you follow this pattern there is really no need to provide a custom IEqualityComparer<MyObject>. The EqualityComparer<MyObject>.Default will be enough as it will rely on IEquatable<MyObject> in order to perform equality checks

codersl
  • 2,222
  • 4
  • 30
  • 33
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • My GetHashCode method is considering every property whereas the Equals method is providing for a tolerance on the LastUpdate property. It sounds like the HashCode generation needs to take this into consideration as well, is that correct? – JYelton Apr 03 '13 at 20:55
  • @JYelton yes. The hash code implementation should only be based on the values which participate in equality. One way to debug if hash code is your problem is to return a constant (say 0). If your code works with a constant return then there is a bug in your hash code implementation – JaredPar Apr 03 '13 at 20:56
  • There are a couple of issues. The worst one is that `==` calls itself recursively! – Jeppe Stig Nielsen Apr 03 '13 at 20:58
  • @JeppeStigNielsen doh, i always make the `==` bug when I do equality on `class` types. – JaredPar Apr 03 '13 at 21:01
  • 1
    The "actual equality logic here" should start with `if (Object.ReferenceEquals(other, null) || GetType() != other.GetType()) { return false; }` or similar. That's because we have a class that is not `sealed`. Either `this` could be more derived than `other`, or `other` could be more derived than `this`. In that case we must return `false`. Otherwise it will break if someone derives from our non-sealed class. – Jeppe Stig Nielsen Apr 03 '13 at 21:07
  • @JeppeStigNielsen correct. It's been too long since I've done equality on `class` types, been doing it a lot for `struct` recently and I'm making the easy mistake with null. I do disagree on the actual runtime type check though. I agree it's correct in some scenarios but not in all. It's a matter of what equality means for that particular type. I think the OP would have to state if it's correct here. – JaredPar Apr 03 '13 at 21:11
  • In the `==` override, I currently also have `Object.ReferenceEquals(left, right)` after checking whether either or both objects are null, since the class is a reference type. – JYelton Apr 03 '13 at 21:18
  • @JYelton my code will handle the `null == null` case as well. You could short circuit with `Object.ReferenceEquals(left, right)` to avoid the Equals method but it's not strictly necessary – JaredPar Apr 03 '13 at 21:20
  • Nice! I'm gonna cut'n'paste this. Thaanks! – Peter pete Apr 29 '15 at 12:37
5

You cannot "allow some tolerance on LastUpdate" and then use a GetHashCode() implementation that uses the strict value of LastUpdate!

Suppose the this instance has LastUpdate at 23:13:13.933, and the obj instance has 23:13:13.932. Then these two might compare equal with your tolerance idea. But if so, their hash codes must be the same number. But that will not happen unless you're extremely extremely lucky, for the DateTime.GetHashCode() should not give the same hash for these two times.

Besides, your Equals method most be a transitive relation mathematically. And "approximately equal to" cannot be made transitive. Its transitive closure is the trivial relation that identifies everything.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • The role of GetHashCode() with respect to equality isn't very clear to me, but is quickly becoming known thanks to some insight from answers here, thanks. – JYelton Apr 03 '13 at 21:13
  • @JYelton The important rule is: ***If*** `Equals` says two instances are "equal", then `GetHashCode` ***must*** return the same number for both. (Then if `Equals` says they're different, `GetHashCode` can (and should in as many cases as possible) give different numbers.) – Jeppe Stig Nielsen Apr 03 '13 at 21:22