74

How to find whether the List<string> has duplicate values or not ?

I tried with below code. Is there any best way to achieve ?

var lstNames = new List<string> { "A", "B", "A" };

if (lstNames.Distinct().Count() != lstNames.Count())
{
    Console.WriteLine("List contains duplicate values.");
}
Prasad Kanaparthi
  • 6,423
  • 4
  • 35
  • 62

5 Answers5

127

Try to use GroupBy and Any like;

lstNames.GroupBy(n => n).Any(c => c.Count() > 1);

GroupBy method;

Groups the elements of a sequence according to a specified key selector function and projects the elements for each group by using a specified function.

Any method, it returns boolean;

Determines whether any element of a sequence exists or satisfies a condition.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
  • 5
    How is this better than the code in the OP? You still need to group all of the items, so you don't really have any short circuiting here. – Servy Jan 16 '13 at 17:07
  • @Servy, How can i comapre which is best performace code? either my above logic(using Count) or using GroupBy & Any ? – Prasad Kanaparthi Jan 16 '13 at 17:19
  • 6
    This not only has to iterate through all the elements to build the groups, it then has to iterate through potentially all of the groups too. Your original coffee will be faster. – Rawling Jan 16 '13 at 17:29
  • 22
    ... Damn you, autocorrect. – Rawling Jan 16 '13 at 18:41
  • 1
    According to my tests, original code is at least 1.5 times faster (depending on inputs) than this – GorkemHalulu Dec 23 '14 at 10:25
  • 3
    How about replacing `c.Count() > 1` with `c.Skip(1).Any()`? – Oliver Sep 03 '15 at 13:08
  • @Oliver In the case of GroupBy, `Skip(1).Any()` is certainly slower than just calling `Count()` method. `GroupBy` returns materialized inner collections and calling Count() method merely returns the Count property of the inner ICollection. LINQ is very smart that way. Of course the gain is negligible, worrying about that levels of performance most likely is premature optimization. – nawfal Jun 16 '22 at 08:18
50

If you're looking for the most efficient way of doing this,

var lstNames = new List<string> { "A", "B", "A" };
var hashset = new HashSet<string>();
foreach(var name in lstNames)
{
    if (!hashset.Add(name))
    {
        Console.WriteLine("List contains duplicate values.");
        break;
    }
}

will stop as soon as it finds the first duplicate. You can wrap this up in a method (or extension method) if you'll be using it in several places.

Rawling
  • 49,248
  • 7
  • 89
  • 127
  • 2
    +1 performance ten times better in worst case, than in `GroupBy` – Ilya Ivanov Jan 16 '13 at 17:12
  • 2
    @IlyaIvanov Actually, in the worst case (no duplicates), it's about the same, maybe just a tad faster. In the best case (the first two items are duplicates) it's 100% faster, as it will be O(1) not O(n). In the general case it will be dependent on the actual rate of duplicates in the underlying data, while `GroupBy` and `Distinct` take the same time regardless of the underlying data. – Servy Jan 16 '13 at 18:00
  • 1
    "O" means "worst case" by the way. There is no "in the best case it will be O(x)" – John Shedletsky Jun 26 '15 at 17:17
  • 3
    @JohnShedletsky 'O(f)' represents the set of functions that don't grow faster than f, that is to say, g(x) <= f(x) * C for g in O(f) and some constant C, if x is large enough. It doesn't imply anything about best or worst cases. – Flonk Apr 07 '16 at 16:00
29

A generalized and compact extension version of the answer based on hash technique:

public static bool AreAnyDuplicates<T>(this IEnumerable<T> list)
{
    var hashset = new HashSet<T>();
    return list.Any(e => !hashset.Add(e));
}
Zoltán Tamási
  • 12,249
  • 8
  • 65
  • 93
  • 1
    cool, I've added it to my linq extensions, I added an overload to provide a comparer though. – Eluvatar Jul 31 '14 at 15:37
  • I know this pretty old and even though creating an extension method is cool but this is a really bad from performance perspective.. It should be using group by rather than trying to insert each and every list object to hashset. – curiousBoy Feb 09 '19 at 23:27
  • @curiousBoy I'm pretty sure that `GroupBy` is implemented using some kind of hashed structure internally, so basically it should have about the same performance. According to my best knowledge adding elements to a `HashSet` is "cheap" in terms of computation and uses at most the same amount of memory as the original list. Also, I'm not sure but having `GroupBy` and `Any` after each other might not be *very lazy* while it's obvious that this solution will stop on first duplicate item. Could you please clarify why you think it has poor performance? – Zoltán Tamási Feb 11 '19 at 10:06
  • @Eluvatar I know you answered this a long time ago, but, I'd love to see your code if you're willing to share. Thanks! – Erick Brown Jul 03 '20 at 20:13
  • 2
    @ErickBrown The constructor of a `HashSet` does accept a custom comparer, I think @Eluvatar meant to expose that as a parameter of this extension. – Zoltán Tamási Jul 04 '20 at 14:11
12
var duplicateExists = lstNames.GroupBy(n => n).Any(g => g.Count() > 1);
Nasmi Sabeer
  • 1,370
  • 9
  • 21
0
 class Program
{
    static void Main(string[] args)
    {
        var listFruits = new List<string> { "Apple", "Banana", "Apple", "Mango" };
        if (FindDuplicates(listFruits)) { WriteLine($"Yes we find duplicate"); };
        ReadLine();
    }
    public static bool FindDuplicates(List<string> array)
    {
        var dict = new Dictionary<string, int>();
        foreach (var value in array)
        {
            if (dict.ContainsKey(value))
                dict[value]++;
            else
                dict[value] = 1;
        }
        foreach (var pair in dict)
        {
            if (pair.Value > 1)
                return true;
            else
                return false;
        }
        return false;
    }
}  
SUNIL DHAPPADHULE
  • 2,755
  • 17
  • 32