70

I have a L2E query that returns some data that contains duplicate objects. I need to remove those duplicate objects. Basically I should assume that if their IDs are the same then the objects are duplicate. I've tried q.Distinct(), but that still returned duplicate objects. Then I've tried implementing my own IEqualityComparer and passing it to the Distinct() method. The method failed with following text:

LINQ to Entities does not recognize the method 'System.Linq.IQueryable1[DAL.MyDOClass] Distinct[MyDOClass](System.Linq.IQueryable1[DAL.MyDOClass], System.Collections.Generic.IEqualityComparer`1[DAL.MyDOClass])' method, and this method cannot be translated into a store expression.

And here is the implementation of EqualityComparer:

  internal class MyDOClassComparer: EqualityComparer<MyDOClass>
    {
        public override bool Equals(MyDOClass x, MyDOClass y)
        {
            return x.Id == y.Id;
        }

        public override int GetHashCode(MyDOClass obj)
        {
            return obj == null ? 0 : obj.Id;
        }
    }

So how do I write my own IEqualityComparer properly?

Eranga
  • 32,181
  • 5
  • 97
  • 96
Bogdan Verbenets
  • 25,686
  • 13
  • 66
  • 119
  • `GroupBy()` may be better solution than `Distinct()` - like mentioned [in the top rated answer](https://stackoverflow.com/a/16983700/4298200) on [this question](https://stackoverflow.com/questions/16983618). – Martin Zaloga Jan 12 '15 at 09:00

4 Answers4

165

An EqualityComparer is not the way to go - it can only filter your result set in memory eg:

var objects = yourResults.ToEnumerable().Distinct(yourEqualityComparer);

You can use the GroupBy method to group by IDs and the First method to let your database only retrieve a unique entry per ID eg:

var objects = yourResults.GroupBy(o => o.Id).Select(g => g.First());
Eranga
  • 32,181
  • 5
  • 97
  • 96
Rich O'Kelly
  • 41,274
  • 9
  • 83
  • 114
  • 12
    +1 This is a life saver, however note that you cannot use .First() instead you will have to use .FirstOrDefault() – yoel halb Nov 18 '13 at 18:03
  • I owe you an education! One of those answers I wish I could up up up up-vote! – seebiscuit Jan 12 '17 at 15:25
  • 1
    @yoelhalb doesn't GroupBy guarantee none of the groupings returned are empty? There's no way for one of the returned groupings to be empty, since the groupings are formed by separating out the elements – vijrox Jul 25 '17 at 19:56
  • 3
    @vijrox I believe that the LINQ to SQL provider that @yoelhalb is referring to does not support the `IQueryable.First` method - but it does support the `IQueryable.FirstOrDefault` method. In this instance, as you say, both logically would return the same results (but only one of the two methods are implemented within the provider). – Rich O'Kelly Jul 26 '17 at 21:49
  • If you are trying to do this operation , after you have loaded your Data from the DB, for example if you want to to this on an ObservableCollection, you have to use the `asQueryable().GroupBy(o => o.Id).Select(c => c.FirstOrDefault())` – Vaggelis Stefanakis Dec 01 '17 at 09:00
  • Problem with this one is: If you use it as ItemsSource with live filtering, when 'one' of a certain value changes than the ItemsSource will have that item changed. So, for example you had {98, 98, ...) among others, if one of the underlying record is changed to 99, than the grouped collection will only contain 99 and not 98 although this value is still present. – Avrohom Nov 11 '18 at 21:20
  • Wow man, that sure is a single line magic code for Distinct by – Chandraprakash Sep 10 '21 at 22:58
22

rich.okelly and Ladislav Mrnka are both correct in different ways.

Both their answers deal with the fact that the IEqualityComparer<T>'s methods won't be translated to SQL.

I think it's worth looking at the pros and cons of each, which will take a bit more than a comment.

rich's approach re-writes the query to a different query with the same ultimate result. Their code should result in more or less how you would efficiently do this with hand-coded SQL.

Ladislav's pulls it out of the database at the point before the distinct, and then an in-memory approach will work.

Since the database is great at doing the sort of grouping and filtering rich's depends upon, it will likely be the most performant in this case. You could though find that the complexity of what's going on prior to this grouping is such that Linq-to-entities doesn't nicely generate a single query but rather produces a bunch of queries and then does some of the work in-memory, which could be pretty nasty.

Generally grouping is more expensive than distinct on in-memory cases (especially if you bring it into memory with AsList() rather than AsEnumerable()). So if either you were already going to bring it into memory at this stage due to some other requirement, it would be more performant.

It would also be the only choice if your equality definition was something that didn't relate well to what is available just in the database, and of course it allows you to switch equality definitions if you wanted to do so based on an IEqualityComparer<T> passed as a parameter.

In all, rich's is the answer I'd say would be most-likely to be the best choice here, but the different pros and cons to Ladislav's compared to rich's makes it also well worth studying and considering.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
9

You will not. Distinct operator is called on the database so any code you write in your application cannot be used (you cannot move your equality comparator logic to SQL) unless you are happy with loading all non-distinct values and make distinct filtering in your application.

var query = (from x in context.EntitySet where ...).ToList()
                                                   .Distinct(yourComparer);
Ladislav Mrnka
  • 360,892
  • 59
  • 660
  • 670
6

Late answer but you can do better: if the DAL object is partial (usually is if it is a DB object), you can extend it like this:

public partial class MyDOClass :  IEquatable<MyDOClass>
    {

        public override int GetHashCode()
        {
            return Id == 0 ? 0 : Id;
        }

        public bool Equals(MyDOClass other)
        {
            return this.Id == other.Id;
        }
    }

And the distinct will work without any overload in it.

If not, you can create the IEqualityComparer class like this:

internal class MyDOClassComparer : MyDOClass,  IEquatable<MyDOClass>, IEqualityComparer<MyDOClass>
    {
        public override int GetHashCode()
        {
            return Id == 0 ? 0 : Id;
        }

        public bool Equals(MyDOClass other)
        {
            return this.Id == other.Id;
        }

        public bool Equals(MyDOClass x, MyDOClass y)
        {
            return x.Id == y.Id;
        }

        public int GetHashCode(MyDOClass obj)
        {
            return Id == 0 ? 0 : Id;
        }
    }

And again, use the Distinct without any overload

gil kr
  • 2,190
  • 1
  • 17
  • 23