6

I have a linq query that I am using Distinct() on. If I just call Distinct() without converting to a List then it does not return a distinct list - it still contains duplicates.

However if I convert to a List and then call Distinct() - it works as expected and I only get unique objects.

I'm using Telerik ORM and the objects being returned are the class representing one of the tables in the database.

var uniqueUsers = (from u in Database.Users 
                   select u).Distinct();

The code above does not produce distinct results, however when I convert to a list and call distinct - it does:

var uniqueUsers = (from u in Database.Users 
                   select u).ToList().Distinct();

I suspect this has to do with the collection before being converted to a list, comparing references to objects rather than the object data itself but I do not fully understand what is going on - why does the fist code example not produce unique results and what happens to the collection when using .ToList() that makes it work?

[EDIT] I've simplified the above queries, in the real world the query has several joins which generates non-unique results, however I am returning just the User objects.

I tried overriding the Equals and GetHashCode methods but this did not make any difference.

public override bool Equals(object obj)
{
    User comparingObject = obj as User ;

    if (comparingObject == null)
    {
        return false;
    }
    else
    {
        return comparingObject.UserID.Equals(this.UserID);
    }
}

public override int GetHashCode()
{
    return this.UserID.GetHashCode();
}

[UPDATE] Having run the same query in LinqPad, it works as expected providing a list of distinct entries. However running the same query in LinqPad when using the Telerik ORM dll I get multiple entries. So it appears to be a peculiarity with Telerik. When I have time I will investigate further and raise it with Telerik support.

Sнаđошƒаӽ
  • 16,753
  • 12
  • 73
  • 90
Chris B
  • 5,311
  • 11
  • 45
  • 57
  • 5
    Have you looked at what the generated SQL looks like? It's not clear to me how you could get duplicates in the first place if you're just listing the contents of a table... that seems very odd. – Jon Skeet Jul 01 '13 at 10:06
  • 1
    Perhaps you're overriding `Equals` in code in conclusion to what Jon Skeet says. – Silvermind Jul 01 '13 at 10:08
  • 1
    Have you overridden the `Equals` method of the `User` object? If this is the case, it makes a huge difference between your two cases. – Mohammad Dehghan Jul 01 '13 at 10:10
  • @JonSkeet & MD.Unicorn - apologies, I've added more details to the question – Chris B Jul 01 '13 at 10:21
  • 2
    It would be helpful if you could try to come up with a minimal example which demonstrates the problem, and include the SQL generated for that. – Jon Skeet Jul 01 '13 at 10:30

1 Answers1

12

Obviously you cannot have exact duplicate rows (including the primary key) in your table. Probably what you mean is rows with some equal fields (excluding primary key).

Calling Distinct on IQueryable, generates a SQL DISTINCT operator on the resulting query, which compares every field of your table against each other. Because you cannot have exact duplicate rows in the table, it returns all the rows.

On the other hand, calling Distinct on a List<User> will use Equals method of the User object to compare objects in memory (after fetching all the rows from database). The final result depends on the implementation of Equals method, which could check only some fields for equal values.

Mohammad Dehghan
  • 17,853
  • 3
  • 55
  • 72
  • 1
    Might be obvious, but also note that doing this in the DB (Distinct on IQueryable) is a lot faster since every row do not have to be transfered to your app and be converted to C# objects. – jgauffin Jul 01 '13 at 10:38
  • I'm marking this as the answer - although the issue appears to be a Telerik ORM problem (as it works correctly in LinqPad). This answer clarified to me how the `IQueryable.Distinct` and the `ToList().Distinct` differ which led me down the path of determining the problem. I will post an update when I have the time to raise the issue with Telerik and have more information. – Chris B Jul 05 '13 at 10:04
  • Just the thing I was looking for. +1 mate. – Sнаđошƒаӽ Jan 22 '17 at 12:24