7

I have a table that contains a bunch of duplicates. These are exact duplicates, minus the primary key column, which is an integer identity column.

Using EF and LINQ, how do I find the duplicates and delete them, leaving only one copy.

I found the duplicates and a count of each using SQL and SSMS. I'm just don't know where to start with LINQ.

Thanks!

DenaliHardtail
  • 27,362
  • 56
  • 154
  • 233

2 Answers2

8

Off the top of my head (untested):

var q = from r in Context.Table
        group r by new { FieldA = r.FieldA, FieldB = r.FieldB, // ...
            into g
        where g.Count() > 1
        select g;
foreach (var g in q)
{
    var dupes = g.Skip(1).ToList();
    foreach (var record in dupes)
    {
        Context.DeleteObject(record);
    }
}
Context.SaveChanges();
Craig Stuntz
  • 125,891
  • 12
  • 252
  • 273
  • The while loop will become an infinite loop. Why not "Skip(1)" in the dups and then "foreach" over the result and delete them that way? – Enigmativity Dec 21 '10 at 21:05
  • This is similiar to what I initially thought the solution would be. I just had difficulty putting all the LINQ pieces together. Thanks! – DenaliHardtail Dec 21 '10 at 22:12
  • 1
    Nice work. I could suggest that you go one step further and flatten the results purely in LINQ and then do a single "foreach" loop to delete. Or even use ".ToArray()" and then "Array.ForEach(dupes, Context.DeleteObject);" Just a suggestion. :-) – Enigmativity Dec 21 '10 at 23:06
1

Building on @Craig Stuntz's answer with a "one liner" alternative:

var duplicates = db.Table.GroupBy(a => new { a.FieldA, a.FieldB, ...})
                         .Where(a => a.Count() > 1)
                         .SelectMany(a => a.ToList());

foreach (var d in duplicates)
{
     db.DeleteObject(d);
}

db.SaveChanges();