1

I have a database entity Foo that has the following properties:

public partial class Foo
{
  public string Name { get; set; }
  public string Type { get; set; }
  etc...
}

I want to get the records where Type is one of a set of Types, and where there are two or more records for the same Name that meet this criteria.

The LINQ statement I have is:

var records = databaseContext.Foo
  .Where(r => listOfTypes.Contains(r.Type))
  .GroupBy(r => r.Name)
  .Where(x => x.Count() > 1)
  .SelectMany(g => g);

However, this gives an exception due to EF Core no longer falling back to client-side evaluation:

System.InvalidOperationException: Processing of the LINQ expression 'g => g' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See https://go.microsoft.com/fwlink/?linkid=2101433 for more detailed information.

I can force client-side evaluation by adding an .AsEnumerable() at the start of the LINQ statement, but this is horrendously slow and inefficient, taking over an hour to process the 20K+ records in my Foo table

How can I rewrite this so that it can be translated to SQL, or get this result some other way?

aforest-ccc
  • 85
  • 1
  • 11
  • your LINQ query is right. So the issue might be related to another one, which I'm not sure about. I tried it with the same concept on LINQ pad. check out this screenshot https://i.stack.imgur.com/NMG7w.png and this is the result https://i.stack.imgur.com/Io7WX.png – ash Nov 05 '20 at 07:39
  • check out this SO question if it gives you any clues https://stackoverflow.com/questions/58898373/ef-core-linq-query-failing-due-to-limitation – ash Nov 05 '20 at 07:46
  • To find duplicates you need analytic functions like ROW_NUMBER and raw SQL. That's not what ORMs like EF are about – Panagiotis Kanavos Nov 05 '20 at 08:34

2 Answers2

1

Please read this answer LINQ to Database: how to group entities properly and GroupBy limitations. Your query is first - trying to select data from fields which are not a grouping keys.

Here you have to do additional join to the original table.

var baseQuery = databaseContext.Foo
   .Where(r => listOfTypes.Contains(r.Type));

var duplicateNames = baseQuery
  .GroupBy(r => r.Name)
  .Where(x => x.Count() > 1)
  .Select(g => g.Key);

var records = baseQuery
  .Join(duplicateNames, f => f.Name, n => n, (f, n) => f);

Also this query also can be written by Window Functions, but they are not supported by vanilla EF.

Anyway leaving here fastest solution based on EF Core extension linq2db.EntityFrameworkCore

var withCount = 
   from f in databaseContext.Foo
   where listOfTypes.Contains(f.Type)
   select new 
   {
       Record = f,
       Count = Sql.Ext.Count().Over().PartitionBy(f.Name).ToValue()
   };

// switch to alternative LINQ provider
withCount = withCount.ToLinqToDB();

var result = 
   from r in withCount
   where r.Count > 1
   select r.Record;
Svyatoslav Danyliv
  • 21,911
  • 3
  • 16
  • 32
  • I've tried using your linq2dv.EntityFrameworkCore solution, but I get the exception "System.TypeLoadException: Method 'ExecuteAsyncEnumerable' in type 'LinqToDB.EntityFrameworkCore.Internal.LinqToDBForEFQueryProvider`1' from assembly 'linq2db.EntityFrameworkCore, Version=3.7.0.0, Culture=neutral, PublicKeyToken=e41013125f9e410a' does not have an implementation." – aforest-ccc Nov 06 '20 at 06:51
  • Using your first set of LINQ queries gets me a "There is already an open DataReader associated with this Connection which must be closed first." exception. – aforest-ccc Nov 06 '20 at 07:13
  • @aforest-ccc, which version of EF Core do you use? – Svyatoslav Danyliv Nov 06 '20 at 14:55
0

You need to rewrite the LINQ query w/o GroupBy, but with correlated subquery filter based on existence (actually non existence) of another record with the same property value, e.g.

// Base query with other filters applied
var baseQuery = databaseContext.Foo
    .Where(r => listOfTypes.Contains(r.Type);
// Actual query
var query = baseQuery
    .Where(r => !baseQuery.Any(f => f.Id != r.Id && f.Name == r.Name));

In case the entity really has no key (keyless?), the query can use correlated subquery Count based condition, but it won`t be so efficient as the above:

var query = baseQuery
    .Where(r => baseQuery.Count(f => f.Name == r.Name) > 1);
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • Foo doesn't have an Id field, so the final part of this query won't work. – aforest-ccc Nov 06 '20 at 07:17
  • But it has unique key (PK) which you haven't shown, isn't it? Otherwise it won't be an entity. Just use that property instead of `Id`. See the updated `Count` based solution in case the entity really has no key. – Ivan Stoev Nov 06 '20 at 09:15