C# LINQ find duplicates in List

Question

Using LINQ, from a List<int>, how can I retrieve a list that contains entries repeated more than once and their values?

score 855 · Accepted Answer · edited Mar 28 '18 at 05:05

855

The easiest way to solve the problem is to group the elements based on their value, and then pick a representative of the group if there are more than one element in the group. In LINQ, this translates to:

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => y.Key)
              .ToList();

If you want to know how many times the elements are repeated, you can use:

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => new { Element = y.Key, Counter = y.Count() })
              .ToList();

This will return a List of an anonymous type, and each element will have the properties Element and Counter, to retrieve the information you need.

And lastly, if it's a dictionary you are looking for, you can use

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .ToDictionary(x => x.Key, y => y.Count());

This will return a dictionary, with your element as key, and the number of times it's repeated as value.

edited Mar 28 '18 at 05:05

Vadim Ovchinnikov

13,327
5
62
90

answered Aug 31 '13 at 10:58

Save

11,450
1
18
23

Now just a wonder, let's say that duplicated int are distributed into n int arrays, im using dictionary and for loop to understand which array contains a duplicate and remove it according to a logic of distribution, is there a fastest way (linq wondering) to achieve that result ? thank you in advance for interest. – Mirko Arcese Aug 31 '13 at 11:25
I'm doing something like this : `code` for (int i = 0; i < duplicates.Count; i++) { int duplicate = duplicates[i]; duplicatesLocation.Add(duplicate, new List()); for (int k = 0; k < hitsList.Length; k++) { if (hitsList[k].Contains(duplicate)) { duplicatesLocation.ElementAt(i).Value.Add(k); } } // remove duplicates according to some rules. } `code` – Mirko Arcese Aug 31 '13 at 11:26
if you want to find duplicates in a list of arrays, give a look to SelectMany – Save Aug 31 '13 at 15:31
I'm searching for duplicates in an array of lists, but didnt get how selectmany can help me to make it out – Mirko Arcese Aug 31 '13 at 19:18
That's pretty nice, thank u for explanation and teaching, so i will pass an array of lists to LINQ, now afterall im using find to know in which list are located the duplicates, obtaining a dictionary, is it possible to do it within LINQ query in the way result is KEY duplicates - VALUE indexes of list containing duplicate ? – Mirko Arcese Sep 01 '13 at 13:19
done man :D http://stackoverflow.com/questions/18561472/linq-select-duplicates-from-multiple-lists – Mirko Arcese Sep 01 '13 at 18:27
I get error "*The name 'g' does not exist in the current context*" for the second code block. – Daniel Jan 04 '15 at 15:22
This is a fantastic and helpful answer. I was even able to check for duplicates in a list of strings based on just a smaller substring section and return the entire rows involved with: GroupBy( line => line.Substring( 2, 9 ) ).Where( grp => grp.Count() > 1 ).SelectMany( grp => grp ) – kwill Jul 19 '17 at 13:43
7

To check if any collection has more than one element if is more efficient to use Skip(1).Any() instead of Count(). Imagine a collection with 1000 elements. Skip(1).Any() will detect there is more than 1 once it finds the 2nd element. Using Count() requires to access the complete collection. – Harald Coppoolse Oct 26 '17 at 08:03
In my case, I had to convert the list to IEnumerable to make this work, i.e. `lst.AsEnumerable().GroupBy(x => x) ...` It didn't work with a list. – Matt Nov 08 '21 at 15:40
@HaraldCoppoolse that's not true in the `GroupBy`'s case though. `GroupBy` returns materialized inner collections (though .NET typed it as IEnumerable!! It should have been an IReadOnlyCollection) and calling `Count()` method merely returns the `Count` property of the inner `ICollection`. LINQ is very smart that way. In fact I am fairly sure `Skip(1).Any()` will be slower – nawfal Jun 16 '22 at 05:02

maxbeaudoin · Answer 2 · 2016-02-25T05:38:34.383

204

Find out if an enumerable contains any duplicate :

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

Find out if all values in an enumerable are unique :

var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);

edited Feb 25 '16 at 05:38

answered Dec 01 '14 at 17:34

maxbeaudoin

6,546
5
38
53

3

Is there any possibility these are not always boolean opposites? anyDuplicate == !allUnique in all cases. – Garr Godfrey Oct 26 '18 at 22:46
5

@GarrGodfrey They are always boolean opposites – Caltor Nov 19 '18 at 17:03
4

to get what were duplicated, just change Any to Where. – Ariwibawa Jun 17 '22 at 06:23

score 32 · Answer 3 · edited Jun 22 '22 at 09:34

32

To find the duplicate values only:

var duplicates = list.GroupBy(x => x.Key).Where(g => g.Count() > 1);

E.g.

var list = new[] {1,2,3,1,4,2};

GroupBy will group the numbers by their keys and will maintain the count (number of times it is repeated) with it. After that, we are just checking the values which have repeated more than once.

To find the unique values only:

var unique = list.GroupBy(x => x.Key).Where(g => g.Count() == 1);

E.g.

var list = new[] {1,2,3,1,4,2};

GroupBy will group the numbers by their keys and will maintain the count (number of times it repeated) with it. After that, we are just checking the values who have repeated only once means are unique.

edited Jun 22 '22 at 09:34

Flater

12,908
4
39
62

answered Nov 09 '18 at 05:47

Lav Vishwakarma

1,380
14
22

Below code will also find unique items. `var unique = list.Distinct(x => x)` – Malu MN Jun 08 '20 at 07:31
1

Your ANY syntax will NOT return the duplicates, it will merely tell you if there are any. Use the ALL syntax in the first example as well, and that should sort it! – Silviu Preda Mar 15 '21 at 12:14
2

Both examples only return booleans which is not what the OP asked. – DarkBarbarian Aug 31 '21 at 15:46
@MaluMN: The answer uses "unique values only" to mean "only the values which appear only once". `Distinct` works differently, in that it will not just return the values which appear only once, but also the values which appear multiple times (but it will return them only once instead of all of the multiple times); which is different from what the answer was referring to. – Flater Jun 22 '22 at 09:28
`.All(g => g.Count() == 1)` should be `.Where(g => g.Count() == 1)`. `All` would not "find the unique values" as you suggest, it would confirm that there are no duplicates in the entire list (= that **all** groups have a count of 1) – Flater Jun 22 '22 at 09:30
The exact same comment as before applies to `.Any(g => g.Count() > 1)`, which should have been `.Where(g => g.Count() > 1)`. `Any` would not find the duplicates themselves, it would only confirm that at least one duplicate exists (i.e. that there is **any** group with a count greater than 1). – Flater Jun 22 '22 at 09:31

score 31 · Answer 4 · edited Dec 04 '19 at 16:49

31

Another way is using HashSet:

var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));

If you want unique values in your duplicates list:

var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();

Here is the same solution as a generic extension method:

public static class Extensions
{
  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
  {
    var hash = new HashSet<TKey>(comparer);
    return source.Where(item => !hash.Add(selector(item))).ToList();
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
  {
    return source.GetDuplicates(x => x, comparer);      
  }

  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
  {
    return source.GetDuplicates(selector, null);
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
  {
    return source.GetDuplicates(x => x, null);
  }
}

edited Dec 04 '19 at 16:49

AlexMelw

2,406
26
35

answered Oct 20 '13 at 10:00

HuBeZa

4,715
3
36
58

This does not work as expected. Using `List { 1, 2, 3, 4, 5, 2 }` as the source, the result is an `IEnumerable` with one element having the value of `1` (where the correct duplicate value is 2) – BCA Jan 13 '17 at 21:05
@BCA yesterday, I think you're wrong. Check out this example: https://dotnetfiddle.net/GUnhUl – HuBeZa Jan 15 '17 at 11:56
Your fiddle prints out the correct result. However, I added the line `Console.WriteLine("Count: {0}", duplicates.Count());` directly below it and it prints `6`. Unless I'm missing something about the requirements for this function, there should only be 1 item in the resulting collection. – BCA Jan 16 '17 at 13:21
@BCA yesterday, it's a bug caused by LINQ deferred execution. I've added `ToList` in order to fix the issue, but it means that the method is executed as soon as it called, and not when you iterate over the results. – HuBeZa Jan 16 '17 at 14:55
`var hash = new HashSet(); ` `var duplicates = list.Where(i => !hash.Add(i));` will lead to a list that includes all occurrences of duplicates. So if you have four occurrences of 2 in your list, then your duplicate list will contain three occurrences of 2, since only one of the 2's can be added to the HashSet. If you want your list to contain unique values for each duplicate, use this code instead: `var duplicates = mylist.Where(item => !myhash.Add(item)).ToList().Distinct().ToList();` – solid_luffy Jul 25 '18 at 13:08
this works good for IEnumerable , tanx everyone can from IEnumerable convert to target Type – R.Akhlaghi Dec 14 '19 at 16:55

score 13 · Answer 5 · edited May 16 '18 at 21:43

13

You can do this:

var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();

With these extension methods:

public static class Extensions
{
    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        var grouped = source.GroupBy(selector);
        var moreThan1 = grouped.Where(i => i.IsMultiple());
        return moreThan1.SelectMany(i => i);
    }

    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
    {
        return source.Duplicates(i => i);
    }

    public static bool IsMultiple<T>(this IEnumerable<T> source)
    {
        var enumerator = source.GetEnumerator();
        return enumerator.MoveNext() && enumerator.MoveNext();
    }
}

Using IsMultiple() in the Duplicates method is faster than Count() because this does not iterate the whole collection.

edited May 16 '18 at 21:43

hunch_hunch

2,283
1
21
26

answered Aug 31 '13 at 13:28

Alex Siepman

2,499
23
31

2

If you look at the [reference source for Grouping](http://referencesource.microsoft.com/System.Core/System/Linq/Enumerable.cs.html#2177) you can see that `Count()` **is** pre computed and your solution is likely slower. – Johnbot Mar 16 '15 at 10:06
1

@Johnbot. You are right, in this case it is faster and the implementatation is likely to never changes... but it depends on an implementation detail of implemetation class behind IGrouping. With my implementaion, you know it will never iterate the whole collection. – Alex Siepman Mar 16 '15 at 16:02
1

so counting [`Count()`] is basically different than iterating the whole list. `Count()` is pre-computed but iterating the whole list is not. – Jogi Feb 02 '17 at 23:32
@rehan khan: I do not understand the difference between Count() and Count() – Alex Siepman Feb 03 '17 at 06:36
@AlexSiepman There shouldn't be any difference that's maybe why. – Jogi Feb 03 '17 at 07:19
2

@RehanKhan: IsMultiple is NOT doing a Count(), it stops Immediately after 2 items. Just like Take(2).Count >= 2; – Alex Siepman Feb 03 '17 at 10:13
@AlexSiepman, I understand your logic behind implementation changing in future and I like your `IsMultiple` approach, it's clever, but just for other visitors: `Count() > 1` as it stands today is certainly faster than checks like `IsMultiple` or `Skip(1).Any()`. And don't forget, in this implementation we haven't disposed the enumerator. Another fast option is MoreLINQ's `AtLeast` method. You could do `AtLeast(2)` here. Type checking and getting the Count property is faster than running the enumerator and disposing it. Of course all this falls under micro-optimization and you shouldn't care. – nawfal Jun 17 '22 at 11:46

score 6 · Answer 6 · edited Apr 20 '17 at 10:13

I created a extention to response to this you could includ it in your projects, I think this return the most case when you search for duplicates in List or Linq.

Example:

//Dummy class to compare in list
public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Surname { get; set; }
    public Person(int id, string name, string surname)
    {
        this.Id = id;
        this.Name = name;
        this.Surname = surname;
    }
}


//The extention static class
public static class Extention
{
    public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    { //Return only the second and next reptition
        return extList
            .GroupBy(groupProps)
            .SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
    }
    public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    {
        //Get All the lines that has repeating
        return extList
            .GroupBy(groupProps)
            .Where(z => z.Count() > 1) //Filter only the distinct one
            .SelectMany(z => z);//All in where has to be retuned
    }
}

//how to use it:
void DuplicateExample()
{
    //Populate List
    List<Person> PersonsLst = new List<Person>(){
    new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
    new Person(2,"Ana","Figueiredo"),
    new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
    new Person(4,"Margarida","Figueiredo"),
    new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
    };

    Console.WriteLine("All:");
    PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All:
        1 -> Ricardo Figueiredo
        2 -> Ana Figueiredo
        3 -> Ricardo Figueiredo
        4 -> Margarida Figueiredo
        5 -> Ricardo Figueiredo
        */

    Console.WriteLine("All lines with repeated data");
    PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All lines with repeated data
        1 -> Ricardo Figueiredo
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
    Console.WriteLine("Only Repeated more than once");
    PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        Only Repeated more than once
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
}

Consider using Skip(1).Any() instead of Count(). If you have 1000 duplicates, then Skip(1).Any() will stop after it finds the 2nd one. Count() will access all 1000 elements. — Harald Coppoolse, Oct 26 '17 at 08:06
If you add this extension method, consider using HashSet.Add instead of GroupBy, as suggeted in one of the other answers. As soon as HashSet.Add finds a duplicate it will stop. Your GroupBy will continue grouping all elements, even if a group with more than one element has been found — Harald Coppoolse, Oct 26 '17 at 08:08

score 3 · Answer 7 · answered Mar 29 '20 at 16:12

3

there is an answer but i did not understand why is not working;

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

my solution is like that in this situation;

var duplicates = model.list
                    .GroupBy(s => s.SAME_ID)
                    .Where(g => g.Count() > 1).Count() > 0;
if(duplicates) {
    doSomething();
}

answered Mar 29 '20 at 16:12

Aykut Gündoğdu

51
2

The first syntax doesn't work because it's actually a boolean extension: the ANY method will return true if at least one element satisfies the predicate, and false otherwise. So your code will tell you only IF you have duplicates, not WHICH are they – Silviu Preda Mar 15 '21 at 12:12

score 2 · Answer 8 · answered Nov 09 '22 at 13:42

Just an another approach:

For just HasDuplicate:

bool hasAnyDuplicate = list.Count > list.Distinct().Count;

For duplicate values

List<string> duplicates = new List<string>();
duplicates.AddRange(list);
list.Distinct().ToList().ForEach(x => duplicates.Remove(x));

// for unique duplicate values:
duplicates.Distinct():

GeoB · Answer 9 · 2018-09-11T08:03:07.100

Complete set of Linq to SQL extensions of Duplicates functions checked in MS SQL Server. Without using .ToList() or IEnumerable. These queries executing in SQL Server rather than in memory.. The results only return at memory.

public static class Linq2SqlExtensions {

    public class CountOfT<T> {
        public T Key { get; set; }
        public int Count { get; set; }
    }

    public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);

    public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);

    public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey> { Key = y.Key, Count = y.Count() });

    public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));
}

score 1 · Answer 10 · answered Jul 16 '20 at 10:26

1

Linq query:

var query = from s2 in (from s in someList group s by new { s.Column1, s.Column2 } into sg select sg) where s2.Count() > 1 select s2;

answered Jul 16 '20 at 10:26

user1785960

565
5
17

score 1 · Answer 11 · answered Aug 08 '21 at 00:18

This More simple way without use Groups just get the District elements and then iterate over them and check their count in the list if their count is >1 this mean it appear more than 1 item so add it to Repeteditemlist

var mylist = new List<int>() { 1, 1, 2, 3, 3, 3, 4, 4, 4 };
            var distList=  mylist.Distinct().ToList();
            var Repeteditemlist = new List<int>();
            foreach (var item in distList)
            {
               if(mylist.Count(e => e == item) > 1)
                {
                    Repeteditemlist.Add(item);
                }
            }
            foreach (var item in Repeteditemlist)
            {
                Console.WriteLine(item);
            }

Expected OutPut:

1 3 4

nawfal · Answer 12 · 2022-06-20T08:02:01.670

All the GroupBy answers are the simplest but won't be the most efficient. They're especially bad for memory performance as building large inner collections has allocation cost.

A decent alternative is HuBeZa's HashSet.Add based approach. It performs better.

If you don't care about nulls, something like this is the most efficient (both CPU and memory) as far as I can think:

public static IEnumerable<TProperty> Duplicates<TSource, TProperty>(
    this IEnumerable<TSource> source,
    Func<TSource, TProperty> duplicateSelector,
    IEqualityComparer<TProperty> comparer = null)
{
    comparer ??= EqualityComparer<TProperty>.Default;

    Dictionary<TProperty, int> counts = new Dictionary<TProperty, int>(comparer);

    foreach (var item in source)
    {
        TProperty property = duplicateSelector(item);
        counts.TryGetValue(property, out int count);

        switch (count)
        {
            case 0:
                counts[property] = ++count;
                break;

            case 1:
                counts[property] = ++count;
                yield return property;
                break;
        }
    }
}

The trick here is to avoid additional lookup costs once the duplicate count has reached 1. Of course you could keep updating the dictionary with count if you also want the number of duplicate occurrences for each item. For nulls, you just need some additional handling there, that's all.

John · Answer 13 · 2021-06-25T13:22:43.207

-3

Remove duplicates by key

myTupleList = myTupleList.GroupBy(tuple => tuple.Item1).Select(group => group.First()).ToList();

edited Jun 25 '21 at 13:22

answered Jun 25 '21 at 12:50

John

1,011
11
18

The question is not about removing duplicates. – Gert Arnold Jun 25 '21 at 15:48

C# LINQ find duplicates in List

13 Answers13

Linked

Related