Using LINQ to find / delete duplicates

Question

I have a table that contains a bunch of duplicates. These are exact duplicates, minus the primary key column, which is an integer identity column.

Using EF and LINQ, how do I find the duplicates and delete them, leaving only one copy.

I found the duplicates and a count of each using SQL and SSMS. I'm just don't know where to start with LINQ.

Thanks!

Is there a reason you don't just delete the duplicates using T-SQL? — Mike M., Dec 21 '10 at 19:14
@ Mike M - I'm trying to do more with LINQ and I'm not sure how to do it with T-SQL either :-) — DenaliHardtail, Dec 21 '10 at 19:16

Craig Stuntz · Accepted Answer · 2010-12-21T21:19:58.737

8

Off the top of my head (untested):

var q = from r in Context.Table
        group r by new { FieldA = r.FieldA, FieldB = r.FieldB, // ...
            into g
        where g.Count() > 1
        select g;
foreach (var g in q)
{
    var dupes = g.Skip(1).ToList();
    foreach (var record in dupes)
    {
        Context.DeleteObject(record);
    }
}
Context.SaveChanges();

edited Dec 21 '10 at 21:19

answered Dec 21 '10 at 19:24

Craig Stuntz

125,891
12
252
273

The while loop will become an infinite loop. Why not "Skip(1)" in the dups and then "foreach" over the result and delete them that way? – Enigmativity Dec 21 '10 at 21:05
This is similiar to what I initially thought the solution would be. I just had difficulty putting all the LINQ pieces together. Thanks! – DenaliHardtail Dec 21 '10 at 22:12
1

Nice work. I could suggest that you go one step further and flatten the results purely in LINQ and then do a single "foreach" loop to delete. Or even use ".ToArray()" and then "Array.ForEach(dupes, Context.DeleteObject);" Just a suggestion. :-) – Enigmativity Dec 21 '10 at 23:06

score 1 · Answer 2 · answered Feb 17 '18 at 12:50

Building on @Craig Stuntz's answer with a "one liner" alternative:

var duplicates = db.Table.GroupBy(a => new { a.FieldA, a.FieldB, ...})
                         .Where(a => a.Count() > 1)
                         .SelectMany(a => a.ToList());

foreach (var d in duplicates)
{
     db.DeleteObject(d);
}

db.SaveChanges();

Using LINQ to find / delete duplicates

2 Answers2

Linked