-1

I am trying to compare (values of the properties) a instance of type in a List and eliminate duplicates. According to MSDN GetHashCode() is one of the way to compare two objects.

A hash code is intended for efficient insertion and lookup in collections that are based on a hash table. A hash code is not a permanent value

Considering that, I started writing my extension method as bellow

 public static class Linq 
    {
  public static IEnumerable<T> DistinctObjects<T>(this IEnumerable<T> source)
        {
            List<T> newList = new List<T>();
            foreach (var item in source)
            {
                if(newList.All(x => x.GetHashCode() != item.GetHashCode()))
                    newList.Add(item);
            }
            return newList;
        }
}

This condition always gives me false though the data of the object is same.

newList.All(x => x.GetHashCode() != item.GetHashCode())

Finally I would like to use it like

MyDuplicateList.DistinctObjects().ToList();

If comparing all fields of the object is too much, I am okay to use it like,

 MyDuplicateList.DistinctObjects(x=>x.Id, x.Name).ToList();

Here I am telling compare only these two fields of those objects.

HaBo
  • 13,999
  • 36
  • 114
  • 206
  • 3
    There is a built in LINQ function, `Distinct()` which takes an expression to determine uniqueness. Does this not do what you want? – GEEF Oct 19 '16 at 18:21
  • 3
    First of all - if GetHashCode is equal it doesn't mean objects are equal it's the other way round – MistyK Oct 19 '16 at 18:21
  • @GEEF .Distinct() is not doing the job. – HaBo Oct 19 '16 at 18:21
  • @YacoubMassad GetHashCode() is definitely implemented by whatever T is, considering its on the base C# `object`. – GEEF Oct 19 '16 at 18:22
  • 2
    It's not doing the job because you need to use comparer or implement interface IEquatable for object, override equals and gethashcode otherwise it's comparing references – MistyK Oct 19 '16 at 18:22
  • @HaBo see this: http://stackoverflow.com/questions/10719928/how-to-use-linq-distinct-with-multiple-fields – GEEF Oct 19 '16 at 18:22
  • @GEEF that link solution is suggesting to do anonymous selection, which I don't want to, I want to keep this function as generic as possible so that I can use it at all possible duplicate collections. – HaBo Oct 19 '16 at 18:24
  • @Zbigniew you are saying if I have a Customer class, I need to implement IEquatable on it, then if I do MyDuplicateCustomerList.Distinct(), then it will give me unique results? Am I right? – HaBo Oct 19 '16 at 18:26
  • @HaBO no, unless you override Equals and Hashcode function as well. Here is the link: https://blogs.msdn.microsoft.com/jaredpar/2009/01/15/if-you-implement-iequatablet-you-still-must-override-objects-equals-and-gethashcode/ – MistyK Oct 19 '16 at 18:31
  • @Zbigniew Technically it would work even if you didn't override `Equals`, it's just a horrible idea not to also override `Equals` as it would be *super* confusing to users of the type to have the two `Equals` methods on the object function differently. – Servy Oct 19 '16 at 18:32
  • @Servy it would work in this example but as stated in the link I provided it wouldn't work for non generic collections. And the second reason is the reason you provided – MistyK Oct 19 '16 at 18:34
  • @Zbigniew I'm not saying he shouldn't do it, I'm just saying that when you said he needed to do it for the sake of the given example, that isn't the case, it's just a good idea as it will make other situations function properly. – Servy Oct 19 '16 at 18:36
  • @Servy yes, you are right. I just read it again – MistyK Oct 19 '16 at 18:44

2 Answers2

4

After reading your comments I would propose this solution:

   public static IEnumerable<TSource> DistinctBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
    {
        HashSet<TResult> set = new HashSet<TResult>();

        foreach(var item in source)
        {
            var selectedValue = selector(item);

            if (set.Add(selectedValue))
                yield return item;
        }
    }

Then you can use it like this:

var distinctedList = myList.DistinctBy(x => x.A);

or for multiple properties like that:

var distinctedList = myList.DistinctBy(x => new {x.A,x.B});

The advantage of this solution is you can exactly specify what properties should be used in distinction and you don't have to override Equals and GetHashCode for every object. You need to make sure that your properties can be compared.

MistyK
  • 6,055
  • 2
  • 42
  • 76
  • Wait a sec, is this returning only the object that i pass in Distinct(x=>x.Id) ? – HaBo Oct 19 '16 at 18:34
  • @HaBo no, return type is IEnumerable. Please read about it on MSDN. – MistyK Oct 19 '16 at 18:36
  • nice way to do this! Best thing would still be to implement in each object a proper equals / gethashcode since by default without these you can get a lot of issues – Fredou Oct 19 '16 at 18:38
0

You shouldn't need to create your own custom, generic method for this. Instead, provide a custom EqualityComparar for your data type:

var myDuplicates = myList.Distinct(new MyComparer());

Where you define a custom Comparer like this:

public class MyComparer : IEqualityComparer<Mine>
{
    public bool Equals(Mine x, Mine y)
    {
        if (x == null && y == null) return true;            
        if (x == null || y == null) return false;
        return x.Name == y.Name && x.Id == y.Id;
    }

    public int GetHashCode(Mine obj)
    {
        return obj.Name.GetHashCode() ^ obj.Id.GetHashCode();
    }
}

Edit: I initially had incorrect code here, this should do what you want without you having to override an Equals operator

GEEF
  • 1,175
  • 5
  • 17
  • if you are going to show a Equals / GetHashCode implementation, at least make sure to do it correctly, Equals can crash with null exception and your forgetting the prime number in the GetHashCode – Fredou Oct 19 '16 at 18:34
  • The prime number is not *required* to make this function correctly. But sure it could definitely crash from an NRE, he can fill in some details though. – GEEF Oct 19 '16 at 18:35