41

I am looking for a really fast way to check for duplicates in a list of objects.

I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...

Suppose I have an object...

public class dupeCheckee
{
     public string checkThis { get; set; }
     public string checkThat { get; set; }

     dupeCheckee(string val, string val2)
     {
         checkThis = val;
         checkThat = val2;
     }
}

And I have a list of those objects

List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe... 
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe

I need to find the dupes in that list. When I find it, I need to do some additional logic not necessarily removing them.

When I use linq some how my GroupBy is throwing an exception...

'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)

Which is telling me that I am missing a library. I am having a hard time figuring out which one though.

Once I figure that out though, How would I essentially check for those two conditions... IE checkThis and checkThat both occur more than once?

UPDATE: What I came up with

This is the linq query that I came up with after doing quick research...

test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()

I am not certain if this is definitely better than this answer...

var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any());

I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...

The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....

Dupes:

List<DupeCheckee> test = new List<DupeCheckee>{ 
     new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}

};

No dupes...

     List<DupeCheckee> test2 = new List<DupeCheckee>{ 
     new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}

};
Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49
SoftwareSavant
  • 9,467
  • 27
  • 121
  • 195

8 Answers8

66

You need to reference System.Linq (e.g. using System.Linq)

then you can do

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any());

This will give you groups with all the duplicates

The test for duplicates would then be

var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any()).Any();

or even call ToList() or ToArray() to force the calculation of the result and then you can both check for dupes and examine them.

eg..

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any()).ToArray();
if (dupes.Any()) {
  foreach (var dupeList in dupes) {
    Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",
                      dupList.Key.checkThis, 
                      dupList.Key.checkThat,
                      dupList.Count() - 1));
  }

}

Alternatively

var dupes = dupList.Select((x, i) => new { index = i, value = x})
                   .GroupBy(x => new {x.value.checkThis, x.value.checkThat})
                   .Where(x => x.Skip(1).Any());

Which give you the groups which each item per group stores the original index in a property index and the item in the property value

user2924019
  • 1,983
  • 4
  • 29
  • 49
Bob Vale
  • 18,094
  • 1
  • 42
  • 49
  • I am really looking to see if the item has any dupes at all. It would be nice to have several 'List' with all the duplicates in them... That will be nice if the user wants to remove them later, But I am really just looking to check if the list has dupes at all. – SoftwareSavant Apr 25 '13 at 13:15
  • @DmainEvent Thats what this does? If you want to check if there are any dupes just check `dupes.Any()` if true there are duplicates – Bob Vale Apr 25 '13 at 16:02
  • Could you take a look at my solution and see if you detect anything deficient about my solution. I tried both yours and mine, mine seems fine... Not certain about yours. – SoftwareSavant Apr 25 '13 at 16:52
  • @DemainEvent Well in your original post you specified the requirement of extracting the duplicates, which your solution doesn't do. – Bob Vale Apr 25 '13 at 17:42
  • In the second code snippet, you can rewrite `.Where(x => x.Skip(1).Any()).Any()` as `.Any(x => x.Skip(1).Any())`. – Rudey Oct 02 '14 at 08:28
  • 1
    @RuudLenders Yes you can, however I was trying to show the code as a progression, just adding `any()` on the end of the previous result – Bob Vale Oct 02 '14 at 09:36
17

There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:

var hasDuplicatedEntries = ListWithPossibleDuplicates
                                   .GroupBy(YourGroupingExpression)
                                   .Any(e => e.Count() > 1);
if(hasDuplicatedEntries)
{
   // Do what ever you want in case when list contains duplicates 
}
Maris
  • 4,608
  • 6
  • 39
  • 68
4

I like using this for knowing when there are any duplicates at all. Lets say you had a string and wanted to know if there was any duplicate letters. This is what I use.

string text = "this is some text";

var hasDupes = text.GroupBy(x => x).Any(grp => grp.Count() > 1);

If you wanted to know how many duplicates there are no matter what the duplicates are, use this.

var totalDupeItems = text.GroupBy(x => x).Count(grp =>  grp.Count() > 1);

So for example, "this is some text" has this...

total of letter t: 3

total of letter i: 2

total of letter s: 3

total of letter e: 2

So variable totalDupeItems would equal 4. There are 4 different kinds of duplicates.

If you wanted to get the total amount of dupe items no matter what the dupes are, then use this.

var totalDupes = letters.GroupBy(x => x).Where(grp => grp.Count() > 1).Sum(grp => grp.Count());

So the variable totalDupes would be 10. This is the total duplicate items of each dupe type added together.

1

I think this is what you're looking for:

List<dupeChecke> duplicates = dupeList.GroupBy(x => x)
                                   .SelectMany(g => g.Skip(1));
Captain Skyhawk
  • 3,499
  • 2
  • 25
  • 39
1

For in memory objects I always use the Distinct LINQ method adding a comparer to the solution.

public class dupeCheckee
{
     public string checkThis { get; set; }
     public string checkThat { get; set; }

     dupeCheckee(string val, string val2)
     {
         checkThis = val;
         checkThat = val2;
     }

     public class Comparer : IEqualityComparer<dupeCheckee>
     {
         public bool Equals(dupeCheckee x, dupeCheckee y)
         {
             if (x == null || y == null)
                 return false;

             return x.CheckThis == y.CheckThis && x.CheckThat == y.CheckThat;
         }

         public int GetHashCode(dupeCheckee obj)
         {
             if (obj == null)
                 return 0;

             return (obj.CheckThis == null ? 0 : obj.CheckThis.GetHashCode()) ^
                 (obj.CheckThat == null ? 0 : obj.CheckThat.GetHashCode());
         }
     }
}

Now we can call

List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe... 
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe

var distinct = dupList.Distinct(dupeCheckee.Comparer);
Arturo Martinez
  • 3,737
  • 1
  • 22
  • 35
0

Do a select distinct with linq, e.g. How can I do SELECT UNIQUE with LINQ?

And then compare counts of the distinct results with the non-distinct results. That will give you a boolean saying if the list has doubles.

Also, you could try using a Dictionary, which will guarantee the key is unique.

Community
  • 1
  • 1
MatthewMartin
  • 32,326
  • 33
  • 105
  • 164
0

If any duplicate occurs throws exception. Dictionary checks keys by itself. this is the easiest way.

try
{
  dupList.ToDictionary(a=>new {a.checkThis,a.checkThat});
}
catch{
 //message: list items is not uniqe
}
Isomiddin
  • 21
  • 2
0

I introduced extension for specific types:

public static class CollectionExtensions
{
    public static bool HasDuplicatesByKey<TSource, TKey>(this IEnumerable<TSource> source
                                                       , Func<TSource, TKey> keySelector)
    {
        return source.GroupBy(keySelector).Any(group => group.Skip(1).Any());
    }
}

, usage example in code:

if (items.HasDuplicatesByKey(item => item.Id))
{
    throw new InvalidOperationException($@"Set {nameof(items)} has duplicates.");
}
LMV
  • 36
  • 5