183

I'm having a List<string> like:

List<String> list = new List<String>{"6","1","2","4","6","5","1"};

I need to get the duplicate items in the list into a new list. Now I'm using a nested for loop to do this.

The resulting list will contain {"6","1"}.

Is there any idea to do this using LINQ or lambda expressions?

Kris van der Mast
  • 16,343
  • 8
  • 39
  • 61
Thorin Oakenshield
  • 14,232
  • 33
  • 106
  • 146

9 Answers9

253
var duplicates = lst.GroupBy(s => s)
    .SelectMany(grp => grp.Skip(1));

Note that this will return all duplicates, so if you only want to know which items are duplicated in the source list, you could apply Distinct to the resulting sequence or use the solution given by Mark Byers.

StayOnTarget
  • 11,743
  • 10
  • 52
  • 81
Lee
  • 142,018
  • 20
  • 234
  • 287
  • 6
    lst.GroupBy(s => s.ToUpper()).SelectMany(grp => grp.Skip(1)); If you want to do a case insensitive comparison :) – John Dec 12 '13 at 21:46
  • 2
    @JohnJB - There is an overload of `GroupBy` which allows you to supply an `IEqualityComparer` instead of using `ToUpper` to do a case-insensitive comparison. – Lee Dec 13 '13 at 11:08
  • Skip(1) is skipping the first item :( Do you know what should I do if I want all items? – ParPar Mar 23 '14 at 10:17
  • 2
    @ParPar - Does [this answer](http://stackoverflow.com/a/19817834/152602) do what you want? – Lee Mar 23 '14 at 11:53
  • 2
    As @ScottLangham points out, this doesn't actually return all duplicate records, it returns all duplicate records EXCEPT for the first occurrence in each group. So yes, if you're after a list of just the distinct duplicate values then this answer, with the Distinct method is the way to go, but if you want all the duplicate rows, then Scott's answer I found to be the way to go. – Robert Shattock May 16 '16 at 00:57
  • Fyi for anyone... `.Skip(0)` gets all the duplicates. @ParPar – Si8 Jan 06 '17 at 21:48
  • @Si8 But that gets all the non-duplicates too! – Scott Langham Jul 31 '19 at 10:32
184

Here is one way to do it:

List<String> duplicates = lst.GroupBy(x => x)
                             .Where(g => g.Count() > 1)
                             .Select(g => g.Key)
                             .ToList();

The GroupBy groups the elements that are the same together, and the Where filters out those that only appear once, leaving you with only the duplicates.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Does not provide the exact result as asked in question, but will be useful in most other cases. – Heiner Aug 23 '17 at 10:49
38

Here's another option:

var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };

var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • I don't suppose the downvoter would care to explain what's wrong with this answer? – LukeH Sep 28 '10 at 15:19
  • 2
    Haha, +1 for innovation :) Not only that, this gives exactly what the OP wants. The catch here is that it can give wrong answer if the query is enumerated a second time (to prevent, you have to either clear the set or initialize a new one every time). – nawfal Sep 23 '13 at 19:16
  • Or just slap `.ToList()` at the end of the `duplicates` construction. – Miral May 27 '15 at 02:11
  • 5
    Downvote wasn't from be, but I really thing using side-effects in a `.Where` should be avoided, so that might be the reason. – Paul Groke Aug 19 '15 at 21:18
29

I know it's not the answer to the original question, but you may find yourself here with this problem.

If you want all of the duplicate items in your results, the following works.

var duplicates = list
    .GroupBy( x => x )               // group matching items
    .Where( g => g.Skip(1).Any() )   // where the group contains more than one item
    .SelectMany( g => g );           // re-expand the groups with more than one item

In my situation I need all duplicates so that I can mark them in the UI as being errors.

Scott Langham
  • 58,735
  • 39
  • 131
  • 204
19

I wrote this extension method based off @Lee's response to the OP. Note, a default parameter was used (requiring C# 4.0). However, an overloaded method call in C# 3.0 would suffice.

/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable&lt;T&gt;</remarks>
public static IEnumerable<T> Duplicates<T>
         (this IEnumerable<T> source, bool distinct = true)
{
     if (source == null)
     {
        throw new ArgumentNullException("source");
     }

     // select the elements that are repeated
     IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));

     // distinct?
     if (distinct == true)
     {
        // deferred execution helps us here
        result = result.Distinct();
     }

     return result;
}
Michael
  • 3,821
  • 2
  • 19
  • 18
11
  List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };

    var q = from s in list
            group s by s into g
            where g.Count() > 1
            select g.First();

    foreach (var item in q)
    {
        Console.WriteLine(item);

    }
explorer
  • 11,710
  • 5
  • 32
  • 39
10

Hope this wil help

int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };

var duplicates = listOfItems 
    .GroupBy(i => i)
    .Where(g => g.Count() > 1)
    .Select(g => g.Key);

foreach (var d in duplicates)
    Console.WriteLine(d);
Thakur
  • 1,890
  • 5
  • 23
  • 33
3

I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list. So I came up with looping through the groups to repack the original List with items that have duplicates.

public List<MediaFileInfo> GetDuplicatePictures()
{
    List<MediaFileInfo> dupes = new List<MediaFileInfo>();
    var grpDupes = from f in _fileRepo
                   group f by f.Length into grps
                   where grps.Count() >1
                   select grps;
    foreach (var item in grpDupes)
    {
        foreach (var thing in item)
        {
            dupes.Add(thing);
        }
    }
    return dupes;
}
Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49
Jamie L.
  • 144
  • 1
  • 10
0

All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.

The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.

As always in LINQ there are two versions, one with IEqualityComparer and one without it.

public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
    return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
    IEqualityComparer<TSource> comparer);
{
    if (source == null) throw new ArgumentNullException(nameof(source));
    if (comparer == null)
        comparer = EqualityCompare<TSource>.Default;

    HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
    foreach (TSource sourceItem in source)
    {
        if (!foundElements.Contains(sourceItem))
        {   // we've not seen this sourceItem before. Add to the foundElements
            foundElements.Add(sourceItem);
        }
        else
        {   // we've seen this item before. It is a duplicate!
            yield return sourceItem;
        }
    }
}

Usage:

IEnumerable<MyClass> myObjects = ...

// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();

// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)

// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
    .Where(duplicate => duplicate.Name == "MyName")
    .Take(5);

For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.

IMHO that is an efficiency boost to consider.

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
  • 1
    Just a tip, you can reduce the `HashSet.Contains + Add` combination to just `Add`. Avoid an additional lookup cost. For e.g. in your case: `if (!foundElements.Add(sourceItem)) yield return sourceItem;` That's all you need. – nawfal Jun 17 '22 at 08:58