4

I need some help using a lambda expression to remove duplicate entries in my Entity Framework context. I have a table with the following columns:

Id, DateOfIncident, Description, EmployeeId, IncidentTypeId, and IsAttendanceIncident

I want to remove duplicate entries where DateOfIncident, EmployeeID, IncidentTypeID and IsAttendanceIncident are the same. I do want to keep one entry. I know how to do this using a Stored Procedure with CTE in SQL, but I cannot figure out how to accomplish this task using a Lambda expression.

This code returns a list excluding my duplicates, but now how do I go about removing those duplicates that are not in this list ?

  var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y").GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
         .Select(x => x.FirstOrDefault());

UPDATE:

So I went ahead and wrote the custom IEqualityComparer. Now how do I remove the incidents in my context that are not in my distinctItems using the id ?

 static void Main(string[] args)
    {
        DALIncidents.AttendanceEntities1 db = new DALIncidents.AttendanceEntities1();

       IEnumerable<DALIncidents.Incident> distinctItems = db.Incidents.Where(c => c.IsAttendanceIncident == "Y");
       distinctItems = distinctItems.Distinct(new DALIncidents.DistinctIncidentComparer());

       Console.ReadLine(); 
    }
Bill Greer
  • 3,046
  • 9
  • 49
  • 80

4 Answers4

9
var query = db.Incidents
    .Where(x => x.IsAttendanceIncident == "Y")
    .GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })

Example 1:

    .Select(x => x.FirstOrDefault());  // your original code which retrieves entities to not delete

var dupes = db.Incidents.Except( query ); // get entities to delete

Example 2:

    .SelectMany( x => x.OrderBy( y => y.Id ).Skip(1) ); // gets dupes directly

var dupes = query; // already have what we need

And finally:

foreach( var dupe in dupes )
{
    db.Incidents.Remove( dupe );
}

Example SQL generated from a test context I used earlier where Person entity has a 1:N relationship with watches:

C#:

context.Persons.SelectMany(x => x.Watches.OrderBy(y => y.Id).Skip(1))

Generated SQL:

SELECT 
1 AS [C1], 
[Skip1].[Id] AS [Id], 
[Skip1].[Brand] AS [Brand], 
[Skip1].[Person_Id] AS [Person_Id]
FROM  [dbo].[Persons] AS [Extent1]
CROSS APPLY  (SELECT [Project1].[Id] AS [Id], [Project1].[Brand] AS [Brand], [Project1].[Person_Id] AS [Person_Id]
    FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[Brand] AS [Brand], [Project1].[Person_Id] AS [Person_Id], row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
        FROM ( SELECT 
            [Extent2].[Id] AS [Id], 
            [Extent2].[Brand] AS [Brand], 
            [Extent2].[Person_Id] AS [Person_Id]
            FROM [dbo].[Watches] AS [Extent2]
            WHERE [Extent1].[Id] = [Extent2].[Person_Id]
        )  AS [Project1]
    )  AS [Project1]
    WHERE [Project1].[row_number] > 1 ) AS [Skip1]
Moho
  • 15,457
  • 1
  • 30
  • 31
  • It will work; using EF is not the optimally performing solution for your problem, but you asked how to accomplish this task using EF. EF is not optimal because you need to retrieve all the data and contruct the entity objects just to issue delete commands via EF. – Moho Aug 12 '13 at 14:27
  • Sorry, thought you were the original poster. Skip will be performed after the entity objects are created locally and enumerated. I had edited my previous comment describing why it is not the optimally performing solution - all the data is retrieved and entity objects must be created just to delete them in the DB. The original poster asked for an EF solution, however, not the optimally performing solution (which would be a stored proc) – Moho Aug 12 '13 at 14:31
  • you're correct; skip can only be used on an ordered query, in which case you can simply change the SelectMany lambda expression to order the IGrouping set's items, which works. I removed the example as it just confuses the issue. – Moho Aug 12 '13 at 14:39
  • added the example back in (#2) that retrieves only the items to be deleted. Example SQL generated for the following code where a Person has a 1:N relationship with Watches: doesn't fit :P – Moho Aug 12 '13 at 14:46
1
var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y")
                .GroupBy(x => new { x.Id, x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
                .Select(x => x.FirstOrDefault());


var query2 = from duplicate in db.Incidents
                 .Where(x => x.IsAttendanceIncident == "Y" && !query.Any(i => i.Id == duplicate.Id));

query2 will now just contain the duplicates?

Squirrel5853
  • 2,376
  • 1
  • 18
  • 34
0

You will need to use the Distinct function and if you want to just some of the fields then you need to create an Equality Comparer. (IEqualityComparer)

Ahh just seen the comment above, check this out for more:

Remove duplicates in the list using linq

Community
  • 1
  • 1
Richard
  • 21,728
  • 13
  • 62
  • 101
  • Richard. I updated the question. Can you take another look. Thanks for your suggestion on the IEqualityComparer. – Bill Greer Aug 12 '13 at 14:09
0
var query = db.Incidents.Where(x => x.IsAttendanceIncident == "Y").GroupBy(x => new { x.EmployeeId, x.DateOfIncident, x.IsAttendanceIncident })
     .SelectMany(x => x.Skip(1));
Lee Bailey
  • 3,594
  • 2
  • 18
  • 17
  • Will this be able to be properly translated into SQL? If it won't, you can just specify `Where(group => group.Count() > 1)` and then perform the `Skip` in linq to objects. – Servy Aug 12 '13 at 14:17