1

I have a DB, which containes a few fields. I want to remove duplicates based on one field ("full") - i.e. if there are more than one version of it, I should take any/first of them, and discard the rest...

So far I can't - everything throws an error of some kind.

This is one of my tires. Unfortunately the last Select in distinctList throws an error.

using (var context = new JITBModel())
{
    var allList = context.BackupEvents.Select(i => i.Id).ToList();

    var distinctList = context.BackupEvents
        .GroupBy(x => x.Full)
        .Select(i => i.ToList())
        .Where(c => c.Count > 1)
        .Select(t => t[0].Id).ToList();

    var dups = allList.Except(distinctList);

    context.BackupEvents.RemoveRange(from e in context.BackupEvents
                                     where dups.Contains(e.Id)
                                     select e);
    context.SaveChanges();

}

Also, can't seem to choose .First() within a select query.


UPDATE: for now I implemented a simple ExecuteSqlCommand based on the answer here.

string com = @"DELETE FROM BackupEvents
             WHERE rowid NOT IN (
             SELECT MIN(rowid) 
             FROM BackupEvents 
             GROUP BY full)";
context.Database.ExecuteSqlCommand(com); 

If anyone knows how to do it with entity/linq - let me know :-)

Maverick Meerkat
  • 5,737
  • 3
  • 47
  • 66

1 Answers1

0

instead of t=> t[0].Id, try t.FirstOrDefault().Id.

Maybe code below would work ? I didn't run it, but I'm not getting any pre-compile error using something similar to below.

using (var context = new JITBModel())
{


  var duplicates= context.BackupEvents
    .GroupBy(x => x.Full)
    .Where(grp => grp.Count() > 1)
    .Select(grp=>grp.FirstOrDefault());

  context.BackupEvents.RemoveRange(duplicates);
  context.SaveChanges();

}
farzaaaan
  • 410
  • 3
  • 9