2

I have written the following code to implement Linq.Distinct(IEqualityComparer) in the most basic way possible, however simpleCollection returns 2 items instead if 1.

Oddly, ive noticed that breakpoints on Equals never get hit.

Could it be something to do with my implementation of GetHashCode()?

    public class testobjx
    {
        public int i { get; set; }
    }

    public  class mytest
    {
        public Main()
        {
            var simpleCollection = new[] { new testobjx() { i = 1 }, new testobjx() { i = 1 } }.Distinct(new DistinctCodeType());
            var itemCount = simpleCollection.Count();//this should return 1 not 2.
        }
    }

    public class DistinctCodeType : IEqualityComparer<testobjx>
    {
        public bool Equals(testobjx x, testobjx y)
        {
            return x.i == y.i;
        }

        public int GetHashCode(testobjx obj)
        {
            return obj.GetHashCode();
        }
    }
maxp
  • 24,209
  • 39
  • 123
  • 201
  • What if x, y, or obj are null? – n8wrl Oct 16 '12 at 16:49
  • If the results of GetHashCode do not match, Equals is not even checked. Work on your GetHashCode implementation. – Anthony Pegram Oct 16 '12 at 16:50
  • 1
    FYI: http://msmvps.com/blogs/jon_skeet/archive/2010/12/30/reimplementing-linq-to-objects-part-14-distinct.aspx – Austin Salonen Oct 16 '12 at 16:52
  • @AnthonyPegram - if I just `return 0`, is that a legitimate way to defer *everything* to the Equals method? – maxp Oct 17 '12 at 08:44
  • @maxp, no. That would be the completely non-ideal way to handle it. And you don't want to defer everything to Equals, that's too expensive. The implementation would have to check everything in the collection, and there's no need to do that. If you're not familiar, you should read up on how hash tables work, because under the hood, that's what's being used in Distinct. – Anthony Pegram Oct 17 '12 at 11:52

2 Answers2

5

Try:

public int GetHashCode(testobjx obj)
{
    if (obj == null) return 0;
    return obj.i.GetHashCode();
}
tukaef
  • 9,074
  • 4
  • 29
  • 45
  • 1
    What would be the approach if instead of the class just containing one integer `i`, it contained 10 separate integers, or any other number of complex properties? – maxp Oct 16 '12 at 16:53
  • @maxp, check out this answer http://stackoverflow.com/a/263416/1009661. Also VS add-ins like ReSharper are able to generate implementation of some common interfaces and methods, including GetHashCode. – tukaef Oct 16 '12 at 17:06
1

The default implementation of GetHashCode for an object is based on the object's instance, so two instances of testobjx with the same value have different hash codes. You need to modify your GetHashCode method to interrogate the object's property. If the object has multiple properties you need to figure out which ones are required to uniquely identify the object and compose a single hash code from those.

Doug Mitchell
  • 192
  • 1
  • 9
  • The default implementation of GetHashCode does not uniquely identify an object. It couldn't. It has 4 billion possible values, and there are most assuredly more than 4 billion possible objects. But even so, you would start getting [hash code collisions long before you made it close to 4 billion](http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx). Be careful about relating uniqueness with GetHashCode. That's not its intent. It helps balance a hash table. It narrows things down. But a reasonable implementation never tries to guarantee uniqueness. – Anthony Pegram Oct 16 '12 at 17:32