3

I have an array of CustomObject s and a custom IEqualityComparer<CustomObject>. I have hard coded the IEqualityComparer.GetHashCode() method to return a constant 42.

When I run linq's Distinct method on the array, nothing is filtered out. Anyone know why?

Note: I know there are a number of questions on here about this issue, however, the ones I've seen (C# Distinct on IEnumerable<T> with custom IEqualityComparer, Distinct() with lambda?, etc) only say to make sure to implement GetHashCode. None of them explain, why it doesn't work.

Code:

public class CustomObject
{
    public string Name { get; set; }
}

public class CustomObjectEqualityComparer : IEqualityComparer<CustomObject>
{
    public bool Equals(CustomObject x, CustomObject y)
    {
        //Every CustomObject should now be 'equal'!
        return x.GetHashCode() == y.GetHashCode();
    }

    public int GetHashCode(CustomObject obj)
    {
        //Every CustomObject should now be 'equal'!
        return 42;
    }
}

Test:

[TestFixture]
public class TestRunner
{
    private CustomObject[] customObjects = 
    {
        new CustomObject {Name = "Please"},
        new CustomObject {Name = "Help"},
        new CustomObject {Name = "Me"},
        new CustomObject {Name = "Stack"},
        new CustomObject {Name = "Overflow"},
    };

    [Test]
    public void DistinctTest()
    {
        var distinctArray =
            customObjects.Distinct(new CustomObjectEqualityComparer()).ToArray();

        //Since every CustomObject is 'Equal' there should only be
        //1 element in the array.
        Assert.AreEqual(1, distinctArray.Length);
    }
}

This is the output I get from running the test:

 Expected: 5
 But was:  1

If I debug the test, I can see that GetHashCode is being called, so why isn't Distinct filtering out all the 'duplicates'?

Community
  • 1
  • 1
Philip Pittle
  • 11,821
  • 8
  • 59
  • 123
  • If it is relevant, I'm using `.net 4.5.1` – Philip Pittle Jul 16 '14 at 13:45
  • 1
    Simple, you are calling `GetHashCode` on the `CustomObject` during equality, not calling your hardcoded version. The `Distinct` implementation is what uses the custom comparer. It will first call `GetHashCode`. If there is a collision, it will then call `Equals`, which you have no overridden on `CustomObject` so will be a reference hash code. – Adam Houldsworth Jul 16 '14 at 13:47
  • But I am passing in my `CustomObjectEqualityComparer`. Comparison should be delegated to this class, right? – Philip Pittle Jul 16 '14 at 13:48
  • Damn it, after reading Jon Skeet's response I see it now, you are right @AdamHouldsworth. Thanks! – Philip Pittle Jul 16 '14 at 13:51

1 Answers1

8

When I run linq's Distinct method on the array, nothing is filtered out. Anyone know why?

Yes - Equals is going to return false for any two distinct objects. Within your Equals implementation, you're calling the CustomObject.GetHashCode method (which isn't overridden), not your custom comparer's GetHashCode method. If you expected Equals to call your custom GetHashCode method, you'd need to change it to:

public bool Equals(CustomObject x, CustomObject y)
{
    return GetHashCode(x) == GetHashCode(y);
}

Or given the implementation of GetHashCode(CustomObject), you can just simplify it to:

public bool Equals(CustomObject x, CustomObject y)
{
    return true;
}

Note that your GetHashCode method was being called by Distinct(), but it was then calling your Equals method to check that the objects were really equal... and at that point you were saying that they weren't (barring a massive coincidence).

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Yup that did it. Thanks Jon! – Philip Pittle Jul 16 '14 at 13:52
  • As a side question, is there a thread on here already about why `GetHashCode()` and `Equals()` both need to be called by `Distinct`? I can understand `GetHashCode()` being used to put the objects into a `HashTable`, but can't `Distinct` assume if the Hash is equal the object is equal? – Philip Pittle Jul 16 '14 at 13:53
  • 3
    @ppittle No, it works the other way, that two objects of a different hash are *not* equal (which is subtly different from stating two objects with the same hash are equal). Collisions will happen because hashing narrows the range of available values, so `Equals` is then used afterwards to test for true equality. It is a performance / accuracy trade-off that works very well when you don't get too many collisions. – Adam Houldsworth Jul 16 '14 at 13:54