0

The keyword in this is Queryable.GroupBy instead of Enumerable.GroupBy

I use EntityFramework and I want to check if there are no duplicate values. Several answers on StackOverflow like this one suggest using GroupBy

IQueryable<MyType> myItems = ...
IQueryable<IGrouping<string, MyType> groupsWithSameName = myItems
.GroupBy(myItem => myItem.Name);
// note: IQueryable!

bool containsDuplicates = groupsWithSameName.Any(group => group.Skip(1).Any());

Although this is allowed on IEnumerables, Skip is not supported on an unordered sequence. The NotSupportedException suggests using OrberBy before using the Skip.

As an alternative I could check if there are groups with more than one element using Count

bool containsDuplicates = groupsWithSameName.Any(group => group.Count() > 1);

Both methods require to scan all elements in the collection. This is for the 2nd time because they were also scanned to group them.

Is there a method to check for duplicates on an IQueryable more efficiently?

Community
  • 1
  • 1
Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
  • 2
    Maybe you can take a look on [What is the difference between IQueryable and IEnumerable?](http://stackoverflow.com/questions/252785/what-is-the-difference-between-iqueryablet-and-ienumerablet). And as all of these is deferred execution, I believe it will transform to SQL and execute in SQL and run once only. Maybe other people can help to explain this in more detail – Prisoner Oct 27 '16 at 09:40
  • @Alex, up till EF 7, GroupBy is always in memory operation, EF does not translate GroupBy to SQL Group By. – Akash Kava Oct 27 '16 at 09:53
  • @AkashKava, thanks for your information, learn a lot here – Prisoner Oct 27 '16 at 09:59
  • @AkashKava Why do you think so? I'm using EF6.1.3 and it does `GROUP BY` SQLs. – Ivan Stoev Oct 27 '16 at 10:00
  • @Alex, sorry, I think I misread EF, seems only EF 7 does not translate Group by correctly. – Akash Kava Oct 27 '16 at 10:05

1 Answers1

0

I think that scanning of all the elements will not be avoided. In any case, the process of finding a duplicate with SQL will look like this:

    SELECT
       name, COUNT(*)
    FROM
       MyType
    GROUP BY
       name
    HAVING 
       COUNT(*) > 1

It may be worth trying to seek a solution in such a way?: Linq with group by having count

Community
  • 1
  • 1
Ruslan Borovok
  • 520
  • 2
  • 17