Linq Efficiently find if collection contains duplicates using IQueryable.GroupBy

Question

The keyword in this is Queryable.GroupBy instead of Enumerable.GroupBy

I use EntityFramework and I want to check if there are no duplicate values. Several answers on StackOverflow like this one suggest using GroupBy

IQueryable<MyType> myItems = ...
IQueryable<IGrouping<string, MyType> groupsWithSameName = myItems
.GroupBy(myItem => myItem.Name);
// note: IQueryable!

bool containsDuplicates = groupsWithSameName.Any(group => group.Skip(1).Any());

Although this is allowed on IEnumerables, Skip is not supported on an unordered sequence. The NotSupportedException suggests using OrberBy before using the Skip.

As an alternative I could check if there are groups with more than one element using Count

bool containsDuplicates = groupsWithSameName.Any(group => group.Count() > 1);

Both methods require to scan all elements in the collection. This is for the 2nd time because they were also scanned to group them.

Is there a method to check for duplicates on an IQueryable more efficiently?

Maybe you can take a look on [What is the difference between IQueryable and IEnumerable?](http://stackoverflow.com/questions/252785/what-is-the-difference-between-iqueryablet-and-ienumerablet). And as all of these is deferred execution, I believe it will transform to SQL and execute in SQL and run once only. Maybe other people can help to explain this in more detail — Prisoner, Oct 27 '16 at 09:40
@Alex, up till EF 7, GroupBy is always in memory operation, EF does not translate GroupBy to SQL Group By. — Akash Kava, Oct 27 '16 at 09:53
@AkashKava Why do you think so? I'm using EF6.1.3 and it does `GROUP BY` SQLs. — Ivan Stoev, Oct 27 '16 at 10:00
@Alex, sorry, I think I misread EF, seems only EF 7 does not translate Group by correctly. — Akash Kava, Oct 27 '16 at 10:05

score 0 · Answer 1 · edited May 23 '17 at 10:27

0

I think that scanning of all the elements will not be avoided. In any case, the process of finding a duplicate with SQL will look like this:

    SELECT
       name, COUNT(*)
    FROM
       MyType
    GROUP BY
       name
    HAVING 
       COUNT(*) > 1

It may be worth trying to seek a solution in such a way?: Linq with group by having count

edited May 23 '17 at 10:27

Community

1
1

answered Oct 27 '16 at 09:50

Ruslan Borovok

520
2
17

Linq Efficiently find if collection contains duplicates using IQueryable.GroupBy

1 Answers1