94

Requirement: In an unsorted List, determine if a duplicate exists.

The typical way I would do this is an n-squared nested loop. I'm wondering how others solve this. Is there an elegant, high performance method in Linq? Something generic that takes a lambda or a comparer would be nice.

Note: This is different than LINQ find duplicates in List which returns the actual duplicates. I just need to know if one exists or not.

KyleMit
  • 30,350
  • 66
  • 462
  • 664
kakridge
  • 2,153
  • 1
  • 17
  • 27
  • 1
    i remember seeing this question on here before and people suggested some neat trick I can't remember what it was though... wait for it... jon skeet is around – Peter Perháč Feb 22 '11 at 16:02
  • 1
    Your question seems to be answered, you should mark it accordingly, if not satisfied you can edit your question to explain it more clearly. ;) – Trinidad Feb 24 '11 at 03:07

10 Answers10

210

Unless I'm missing something, then you should be able to get away with something simple using Distinct(). Granted it won't be the most complex implementation you could come up with, but it will tell you if any duplicates get removed:

var list = new List<string>();

// Fill the list

if(list.Count != list.Distinct().Count())
{
     // Duplicates exist
}
Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
  • 8
    + 1 if I recall correctly `Distinct()` uses a Hashtable internally, so should be O(n) – BrokenGlass Feb 22 '11 at 16:03
  • i wonder how fast distinct is... whether it's not doing the "n-square nested loop" as Kenny mentioned he'd like to avoid. – Peter Perháč Feb 22 '11 at 16:03
  • 4
    Don't call list.Count() method. Use the Count property instead. I know LINQ is optimized and it'll use it internally but still I think it's better to use the property. – Petar Petrov Feb 22 '11 at 16:17
  • @Petar - You're right. I just got a little parenthesis happy when I was writing the original post. Fixed. – Justin Niessner Feb 22 '11 at 16:31
  • 1
    This was actually my first thought. Thanks to BrokenGlass for confirming that Distinct() is O(n). – kakridge Mar 10 '11 at 14:55
  • This is ok, if you don't care about case-sensitivity. `new List{"Don","don"};` will not report duplicates. You will need to preprocess data before doing counts. – CrnaStena Jul 31 '14 at 11:50
  • 1
    @PetarPetrov - with regards to `.Count` versus`.Count()` I need to use `.Count()`. If I do not, then I get an error that states _Operator '!=' cannot be applied to operands of type 'method group' and 'method group'_ – Vincent Saelzler Sep 17 '16 at 21:36
  • Edit to my above comment: My variable was an IEnumerable as opposed to a list. Never mind @PetarPetrov in the case asked by the poster you were totally right! – Vincent Saelzler Sep 17 '16 at 21:40
  • 2
    This solution doesn't seems fast as you access the List 3 times. I would consider adding elements to an HasSet until it return false. – Jean-Charbel VANNIER Jan 26 '18 at 16:26
  • For efficient count comparisons, you could make use of `MoreLINQ`. It got bunch of cool extensions like `AtLeast`, `AtMost`, `Exactly` etc. In this case its better to call: `if (list.Distinct().Exactly(list.Count))` – nawfal Jun 17 '22 at 09:01
82

According to Eric White's article on how to Find Duplicates using LINQ:

An easy way to find duplicates is to write a query that groups by the identifier, and then filter for groups that have more than one member. In the following example, we want to know that 4 and 3 are duplicates:

int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
    .GroupBy(i => i)
    .Where(g => g.Count() > 1)
    .Select(g => g.Key);
foreach (var d in duplicates)
    Console.WriteLine(d); // 4,3
ChrisW
  • 54,973
  • 13
  • 116
  • 224
Ali
  • 2,163
  • 1
  • 16
  • 18
  • 4
    This will definitely work but will take longer than necessary (the OP only needs to know if duplicates exist or not...not what the duplicate values are). – Justin Niessner Feb 22 '11 at 16:04
  • 12
    This is more helpful if you need to know the duplicate values are. – liang Jun 06 '13 at 02:44
33

In order to allow short circuiting if the duplicate exists early in the list, you can add a HashSet<T> and check the return value of its .Add method.

By using .Any you can short circuit the enumeration as soon as you find a duplicate.

Here's a LINQ extension method in both C# and VB:

CSharp:

public static bool ContainsDuplicates<T>(this IEnumerable<T> enumerable)
{
    var knownKeys = new HashSet<T>();
    return enumerable.Any(item => !knownKeys.Add(item));
}

Visual Basic:

<Extension>
Public Function ContainsDuplicates(Of T)(ByVal enumerable As IEnumerable(Of T)) As Boolean
    Dim knownKeys As New HashSet(Of T)
    Return enumerable.Any(Function(item) Not knownKeys.Add(item))
End Function

Note: to check if there are no duplicates, just change Any to All

KyleMit
  • 30,350
  • 66
  • 462
  • 664
  • 2
    This is nice an elegant, and is similar to the approach [described here](http://stackoverflow.com/a/19476092/24874) that returns duplicates as well. – Drew Noakes Oct 15 '15 at 10:03
  • Why is this better than using Distinct().Count? – Mihai Socaciu Oct 28 '22 at 13:09
  • 1
    @MihaiSocaciu, because this short circuits, meaning it doesn't have to check every element in a potentially very big collection as soon as one meets the criteria – KyleMit Oct 29 '22 at 19:10
15

Place all items in a set and if the count of the set is different from the count of the list then there is a duplicate.

bool hasDuplicates<T>(List<T> myList) {
    var hs = new HashSet<T>();

    for (var i = 0; i < myList.Count; ++i) {
        if (!hs.Add(myList[i])) return true;
    }
    return false;
}

Should be more efficient than Distinct as there is no need to go through all the list.

Trinidad
  • 2,756
  • 2
  • 25
  • 43
  • 5
    Don't call list.Count() method. Use the Count property instead. I know LINQ is optimized and it'll use it internally but still I think it's better to use the property. – Petar Petrov Feb 22 '11 at 16:17
  • 3
    Granted that it will be more efficient *if there are duplicates*. But if there are no duplicates, then it does the same amount of work. Which one to use probably depends on whether the "normal" case is that there are aren't duplicates. – Jim Mischel Feb 22 '11 at 16:27
  • 1
    @Petar Petrov: Good point. Probably should just use `foreach`. And make the parameter `IEnumerable` rather than `List`. – Jim Mischel Feb 22 '11 at 16:28
7

You can use IEnumerable.GroupBy method.

var list = new List<string> {"1", "2","3", "1", "2"};
var hasDuplicates = list.GroupBy(x => x).Any(x => x.Skip(1).Any());
johnsonlu
  • 71
  • 1
  • 2
3

Something along these lines is relatively simple and will provide you with a count of duplicates.

var something = new List<string>() { "One", "One", "Two", "Three" };

var dictionary = new Dictionary<string, int>();

something.ForEach(s =>
    {
        if (dictionary.ContainsKey(s))
        {
            dictionary[s]++;
        }
        else
        {
            dictionary[s] = 1;
        }
    });

I imagine this is similar to the implementation of Distinct, although I'm not certain.

Ian P
  • 12,840
  • 6
  • 48
  • 70
  • 2
    [HashSet](http://msdn.microsoft.com/en-us/library/bb359438.aspx) seems more straight forward to use. – Trinidad Feb 22 '11 at 16:14
  • 1
    Yeah that does make more sense. – Ian P Feb 22 '11 at 16:23
  • @Trinidad: but will not give you a count – recursive Feb 22 '11 at 16:23
  • @recursive, that's not part of the problem. See: _In an unsorted List, determine if a duplicate exists_ – Trinidad Feb 22 '11 at 16:27
  • This is perfect, as I am new to C#, and needed something to track the count for each instance within a set of values (e.g. 20,000+ filenames pulled from an http resource), and I want to know whether any duplicates exist before potentially overwriting files with duplicate filenames. A Dictionary is what I was considering, so it is heartening to see it recommended here. – Michael M Jun 19 '20 at 22:45
1

Use Enumerable.Any with HashSet.Add like:

List<string> list = new List<string> {"A", "A", "B", "C", "D"};
HashSet<string> hashSet = new HashSet<string>();
if(list.Any(r => !hashSet.Add(r)))
{
   //duplicate exists. 
}

HashSet.Add would return false if the item already exist in the HashSet. This will not iterate the whole list.

Habib
  • 219,104
  • 29
  • 407
  • 436
1

You could use the Distinct() extension method for IEnumerable

1

If you are using integers or well ordered sets, use a binary tree for O(nlog n) performance.

Alternatively, find another faster means of sorting, then simply check that every value is different than the previous one.

andrewjsaid
  • 489
  • 3
  • 11
0

You could use Distinct() statement to find unique records. Then compare with original generic list like this:

  if (dgCoil.ItemsSource.Cast<BLL.Coil>().ToList().Count != dgCoil.ItemsSource.Cast<BLL.Coil>().Select(c => c.CoilNo).Distinct().Count())
  {    
    //Duplicate detected !!
    return;
  }
WebDevBooster
  • 14,674
  • 9
  • 66
  • 70