127

I have a List<string> which has some words duplicated. I need to find all words which are duplicates.

Any trick to get them all?

nawfal
  • 70,104
  • 56
  • 326
  • 368

9 Answers9

225

In .NET framework 3.5 and above you can use Enumerable.GroupBy which returns an enumerable of enumerables of duplicate keys, and then filter out any of the enumerables that have a Count of <=1, then select their keys to get back down to a single enumerable:

var duplicateKeys = list.GroupBy(x => x)
                        .Where(group => group.Count() > 1)
                        .Select(group => group.Key);
Shahin
  • 12,543
  • 39
  • 127
  • 205
Giuseppe Ottaviano
  • 4,533
  • 2
  • 18
  • 18
  • 3
    This gives all lines grouped by their values, not duplicates... you still have to filter by `Count() > 1`. Also, the way I understand the question, each line contains several words, and the OP wants the duplicate words (but perhaps I misunderstood the question) – Thomas Levesque Jan 02 '11 at 12:21
  • 34
    @Thomas: yes the code is not complete, that one is just the first step. Then he can use a `Where` if he wants just the duplicates, like `list.GroupBy(x => x).Where(group => group.Count() > 1).Select(group => Group.Key).ToList()` – Giuseppe Ottaviano Jan 02 '11 at 12:34
  • 1
    No need to count all the items to check if there's more than 1: `.Where(group => group.Skip(1).Any())` – Russell Horwood Apr 19 '22 at 14:50
  • That was cool! It Worked. – NidhinSPradeep Apr 03 '23 at 13:44
36

If you are using LINQ, you can use the following query:

var duplicateItems = from x in list
                     group x by x into grouped
                     where grouped.Count() > 1
                     select grouped.Key;

or, if you prefer it without the syntactic sugar:

var duplicateItems = list.GroupBy(x => x).Where(x => x.Count() > 1).Select(x => x.Key);

This groups all elements that are the same, and then filters to only those groups with more than one element. Finally it selects just the key from those groups as you don't need the count.

If you're prefer not to use LINQ, you can use this extension method:

public void SomeMethod {
    var duplicateItems = list.GetDuplicates();
    …
}

public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> source) {
    HashSet<T> itemsSeen = new HashSet<T>();
    HashSet<T> itemsYielded = new HashSet<T>();

    foreach (T item in source) {
        if (!itemsSeen.Add(item)) {
            if (itemsYielded.Add(item)) {
                yield return item;
            }
        }
    }
}

This keeps track of items it has seen and yielded. If it hasn't seen an item before, it adds it to the list of seen items, otherwise it ignores it. If it hasn't yielded an item before, it yields it, otherwise it ignores it.

ICR
  • 13,896
  • 4
  • 50
  • 78
20

and without the LINQ:

string[] ss = {"1","1","1"};

var myList = new List<string>();
var duplicates = new List<string>();

foreach (var s in ss)
{
   if (!myList.Contains(s))
      myList.Add(s);
   else
      duplicates.Add(s);
}

// show list without duplicates 
foreach (var s in myList)
   Console.WriteLine(s);

// show duplicates list
foreach (var s in duplicates)
   Console.WriteLine(s);
evilone
  • 22,410
  • 7
  • 80
  • 107
14

If you're looking for a more generic method:

public static List<U> FindDuplicates<T, U>(this List<T> list, Func<T, U> keySelector)
    {
        return list.GroupBy(keySelector)
            .Where(group => group.Count() > 1)
            .Select(group => group.Key).ToList();
    }

EDIT: Here's an example:

public class Person {
    public string Name {get;set;}
    public int Age {get;set;}
}

List<Person> list = new List<Person>() { new Person() { Name = "John", Age = 22 }, new Person() { Name = "John", Age = 30 }, new Person() { Name = "Jack", Age = 30 } };

var duplicateNames = list.FindDuplicates(p => p.Name);
var duplicateAges = list.FindDuplicates(p => p.Age);

foreach(var dupName in duplicateNames) {
    Console.WriteLine(dupName); // Will print out John
}

foreach(var dupAge in duplicateAges) {
    Console.WriteLine(dupAge); // Will print out 30
}
Mauricio Ramalho
  • 849
  • 1
  • 8
  • 15
  • Can you please explain ? and ? do we need to include some namespace? or do we need to replace them with correct types of object? – Irshad Babar Dec 16 '20 at 10:03
  • 1
    T and U are the generic types in the method definition. You can replace them when calling the method or, like in my example, they are inferred: list.FindDuplicates(p => p.Name): T -> Person; U -> string; list.FindDuplicates(p => p.Age): T -> Person; U -> int; – Mauricio Ramalho Dec 17 '20 at 11:22
5

Using LINQ, ofcourse. The below code would give you dictionary of item as string, and the count of each item in your sourc list.

var item2ItemCount = list.GroupBy(item => item).ToDictionary(x=>x.Key,x=>x.Count());
Manish Basantani
  • 16,931
  • 22
  • 71
  • 103
4

For what it's worth, here is my way:

List<string> list = new List<string>(new string[] { "cat", "Dog", "parrot", "dog", "parrot", "goat", "parrot", "horse", "goat" });
Dictionary<string, int> wordCount = new Dictionary<string, int>();

//count them all:
list.ForEach(word =>
{
    string key = word.ToLower();
    if (!wordCount.ContainsKey(key))
        wordCount.Add(key, 0);
    wordCount[key]++;
});

//remove words appearing only once:
wordCount.Keys.ToList().FindAll(word => wordCount[word] == 1).ForEach(key => wordCount.Remove(key));

Console.WriteLine(string.Format("Found {0} duplicates in the list:", wordCount.Count));
wordCount.Keys.ToList().ForEach(key => Console.WriteLine(string.Format("{0} appears {1} times", key, wordCount[key])));
Shadow The GPT Wizard
  • 66,030
  • 26
  • 140
  • 208
3

I'm assuming each string in your list contains several words, let me know if that's incorrect.

List<string> list = File.RealAllLines("foobar.txt").ToList();

var words = from line in list
            from word in line.Split(new[] { ' ', ';', ',', '.', ':', '(', ')' }, StringSplitOptions.RemoveEmptyEntries)
            select word;

var duplicateWords = from w in words
                     group w by w.ToLower() into g
                     where g.Count() > 1
                     select new
                     {
                         Word = g.Key,
                         Count = g.Count()
                     }
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
2

I use a method like that to check duplicated entrys in a string:

public static IEnumerable<string> CheckForDuplicated(IEnumerable<string> listString)
{
    List<string> duplicateKeys = new List<string>();
    List<string> notDuplicateKeys = new List<string>();
    foreach (var text in listString)
    {
        if (notDuplicateKeys.Contains(text))
        {
            duplicateKeys.Add(text);
        }
        else
        {
            notDuplicateKeys.Add(text);
        }
    }
    return duplicateKeys;
}

Maybe it's not the most shorted or elegant way, but I think that is very readable.

George Wurthmann
  • 411
  • 2
  • 8
  • 20
0
    lblrepeated.Text = ""; 
    string value = txtInput.Text;
    char[] arr = value.ToCharArray();
    char[] crr=new char[1];        
   int count1 = 0;        
    for (int i = 0; i < arr.Length; i++)
    {
        int count = 0;  
        char letter=arr[i];
        for (int j = 0; j < arr.Length; j++)
        {
            char letter3 = arr[j];
                if (letter == letter3)
                {
                    count++;
                }                    
        }
        if (count1 < count)
        {
            Array.Resize<char>(ref crr,0);
            int count2 = 0;
            for(int l = 0;l < crr.Length;l++)
            {
                if (crr[l] == letter)
                    count2++;                    
            }


            if (count2 == 0)
            {
                Array.Resize<char>(ref crr, crr.Length + 1);
                crr[crr.Length-1] = letter;
            }

            count1 = count;               
        }
        else if (count1 == count)
        {
            int count2 = 0;
            for (int l = 0; l < crr.Length; l++)
            {
                if (crr[l] == letter)
                    count2++;
            }


            if (count2 == 0)
            {
                Array.Resize<char>(ref crr, crr.Length + 1);
                crr[crr.Length - 1] = letter;
            }

            count1 = count; 
        }
    }

    for (int k = 0; k < crr.Length; k++)
        lblrepeated.Text = lblrepeated.Text + crr[k] + count1.ToString();
kittu
  • 17
  • 1