Test if all values in a list are unique

Question

I have a small list of bytes and I want to test that they're all different values. For instance, I have this:

List<byte> theList = new List<byte> { 1,4,3,6,1 };

What's the best way to check if all values are distinct or not?

As this is a typical class room question, I will answer with a question. How would you do it if it was sorted? — ctrl-alt-delor, Aug 18 '13 at 21:40

juergen d · Accepted Answer · 2014-10-04T14:45:36.947

206

bool isUnique = theList.Distinct().Count() == theList.Count();

edited Oct 04 '14 at 14:45

answered Aug 18 '13 at 21:36

juergen d

201,996
37
293
362

1

Just curious: what are the space and time requirements of this? – dtb Aug 18 '13 at 21:44
12

@dtb [should be about O(N)](http://stackoverflow.com/questions/2799427/what-guarantees-are-there-on-the-run-time-complexity-big-o-of-linq-methods). Of course, considering this is a "small list", it will be lightning-fast with almost any algorithm. IMO this wins on readability and conciseness, and since speed's not an issue, that makes it perfect. – Tim S. Aug 18 '13 at 22:04
3

This is leff efficient than it could be – Jodrell Feb 02 '17 at 09:50
1

@Tim Schmelter answer was 4x faster while using the HashSet. This approach would be good for everyday usage, but for larger set like in millions, HashSet should be used. – Chandraprakash Oct 17 '21 at 11:25

Tim Schmelter · Answer 2 · 2013-08-19T05:20:02.113

93

Here's another approach which is more efficient than Enumerable.Distinct + Enumerable.Count (all the more if the sequence is not a collection type). It uses a HashSet<T> which eliminates duplicates, is very efficient in lookups and has a count-property:

var distinctBytes = new HashSet<byte>(theList);
bool allDifferent = distinctBytes.Count == theList.Count;

or another - more subtle and efficient - approach:

var diffChecker = new HashSet<byte>();
bool allDifferent = theList.All(diffChecker.Add);

HashSet<T>.Add returns false if the element could not be added since it was already in the HashSet. Enumerable.All stops on the first "false".

edited Aug 19 '13 at 05:20

answered Aug 18 '13 at 21:59

Tim Schmelter

450,073
74
686
939

1

so simple and obvious, why didn't I think about it first :) I used this one-liner in unit test to confirm 10 million elements generated by my awesome code are really unique `Assert.IsTrue(samples.Add(AwesomeClass.GetUnique()));`. They were and are :) +1 for you Tim :) – grapkulec Apr 04 '15 at 14:34
1

i have tried your answer on this question but it is not working sir:http://stackoverflow.com/questions/34941162/how-to-break-from-the-loop-when-adding-values-to-the-list-object-which-is-distin/34941302?noredirect=1#34941302 – I Love Stackoverflow Jan 22 '16 at 09:22
Should be this: `bool allDifferent = theList.All(s => diffChecker.Add(s))` – mike nelson Nov 28 '17 at 23:05
4

No, not needed. In this case you can pass the delegate directly – Tim Schmelter Nov 28 '17 at 23:09
If you use [MoreLinq](https://github.com/morelinq/MoreLINQ)'s `ToHashSet` extension method, the first version of this is even easier. – Arithmomaniac Mar 27 '18 at 16:23
Note that make this work with a (custom) none generic `IEqualityComparer` you have to rely on `Hashtable`. As `Hashtable.Add` is a `void`, you have to keep a counter. Than you're done, if adding a new item of your list did not increase the size of your `Hashtable`. With a generic `IEqualityComparer` @Rango's solution is extremely elegant. – Corniel Nobel Apr 05 '19 at 09:02
I tried all three approaches in a benachmark and realized, that the one with `new HashSet(theList)` seems to be the fastest, even with very small lists with only 25 entries. – André Reichelt Aug 16 '19 at 12:12
@TimSchmelter Here's my test scenario: https://dotnetfiddle.net/n9mlRX Feel free to fork and modify it and share your results. **PS:** In my project, the lists will most likely be unique. When you comment-out the shuffle section in my code, you can see, that the middle version will always outperform the last one. – André Reichelt Aug 16 '19 at 13:11
2

@AndréReichelt - I just opened your code and the third scenario (`List.All(HashSet.Add)`) seems to be much faster than the other two in almost all cases – Kyle Delaney Feb 21 '20 at 18:39
@TimSchmelter Wow, passing the delegate directly is so much cleaner! Thanks for the info! – Chandraprakash Oct 17 '21 at 11:26

Jodrell · Answer 3 · 2023-03-20T08:23:45.510

Okay, here is the most efficient method I can think of using standard .Net

using System;
using System.Collections.Generic;

public static class Extension
{
    public static bool HasDuplicate<T>(
        this IEnumerable<T> source,
        IEqualityComparer<T> comparer,
        out T firstDuplicate)
    {
        ArgumentNullException.ThrowIfNull(source);
        ArgumentNullException.ThrowIfNull(comparer);
        
        (bool result, firstDuplicate) = HasDuplicateImplementation(source, comparer);
        
        return result;
    }
    
    public static bool HasDuplicate<T>(
        this IEnumerable<T> source,
        out T firstDuplicate)
    {
        ArgumentNullException.ThrowIfNull(source);
        var comparer = EqualityComparer<T>.Default;
        
        (bool result, firstDuplicate) = HasDuplicateImplementation(source, comparer);
        
        return result;
    }
    
    private static (bool, T) HasDuplicateImplementation<T>(
        IEnumerable<T> source,
        IEqualityComparer<T> comparer)
    {
        var checkBuffer = new HashSet<T>(comparer);
        foreach (var t in source)
        {
            if (!checkBuffer.Add(t))
            {
                return (true, t);
            }
        }
        
        return (false, default);
    }
}

essentially, what is the point of enumerating the whole sequence twice if all you want to do is find the first duplicate.

Nice adding a duplicate value out return, quite useful for validation — Pac0, Sep 28 '17 at 10:28
I have tested 3 solutions here and this is indeed the most efficient on this page. A few typos in there though (eg `sequence` should be `source`). But works great once these are fixed — mike nelson, Nov 28 '17 at 23:07
For readability, I think it should be `if (!checkBuffer.Add(t)) { firstDuplicate = t; return true }` in the loop. — tia, Jun 27 '18 at 13:19

score 10 · Answer 4 · answered Sep 12 '19 at 22:37

10

The similar logic to Distinct using GroupBy:

var isUnique = theList.GroupBy(i => i).Count() == theList.Count;

answered Sep 12 '19 at 22:37

Vitali Kuzniatsou

119
1
8

4

This is useful if you want to check uniqueness with respect to a property `theList.GroupBy(o => o.SomeProperty).Count() == theList.Count;` while Distinct() does not allow that. – Rev Sep 24 '20 at 13:40

score 2 · Answer 5 · answered Jul 17 '20 at 13:10

2

I check if an IEnumerable (aray, list, etc ) is unique like this :

var isUnique = someObjectsEnum.GroupBy(o => o.SomeProperty).Max(g => g.Count()) == 1;

answered Jul 17 '20 at 13:10

Namig Hajiyev

1,117
15
16

score 1 · Answer 6 · answered Apr 14 '20 at 10:20

1

One can also do: Use Hashset

var uniqueIds = new HashSet<long>(originalList.Select(item => item.Id));

            if (uniqueIds.Count != originalList.Count)
            {
            }

answered Apr 14 '20 at 10:20

Gauravsa

6,330
2
21
30

score 0 · Answer 7 · answered Aug 18 '13 at 21:41

0

There are many solutions.

And no doubt more beautiful ones with the usage of LINQ as "juergen d" and "Tim Schmelter" mentioned.

But, if you bare "Complexity" and speed, the best solution will be to implement it by yourself. One of the solution will be, to create an array of N size (for byte it's 256). And loop the array, and on every iteration will test the matching number index if the value is 1 if it does, that means i already increment the array index and therefore the array isn't distinct otherwise i will increment the array cell and continue checking.

answered Aug 18 '13 at 21:41

Orel Eraki

11,940
3
28
36

2

you can use a bit vector with 256 bits = 32 byte = 8 integers. But your Big O = O(n) still will be the same as using a Hashet proposed in the other answer. – BrokenGlass Aug 18 '13 at 21:55
This is O(n) so maybe fastest, (test it). Would checking counts as you go or at the end be the quickest? I suspect that at the end will improve worst case, but as you go may improve average and best case). If there is no duplicates this will be worst case performance. Also for bigger data types this will not work well, for a 16bit type you would have to use 64k of count, well 64k bit (8k byte), but for anything bigger the memory usage will start to get silly. However I like this answer for 8bit values. – ctrl-alt-delor Aug 18 '13 at 21:56
1

@TamusJRoyce if you want to store 4294967296 possibilities, you need 4GB not 42MB (or 512MB of you use bit masking) – tigrou Aug 01 '19 at 13:34
Not sure what I was thinking. "Allocate 42MB+ of memory to hold all 4294967296 possibilities. And use simple bucket counters. Or even use bit masking xor and check if any bit is changed from true to false. 42MB+ / 8 = 5MB+ The expense seems too great with today's hardware. But one day this may hold merit." isn't really a relevant comment. Hashset would be best. If you are dealing with extremely large arrays, you expect extremely large piece of memory. But in such a strange edge case, a heristic with a CRC algorithm would be better. Map it to a polynomial. If close, evaluate. Thank you @tigrou! – TamusJRoyce Aug 01 '19 at 17:12

score 0 · Answer 8 · answered Feb 02 '17 at 09:04

And another solution, if you want to find duplicated values.

var values = new [] { 9, 7, 2, 6, 7, 3, 8, 2 };

var sorted = values.ToList();
sorted.Sort();
for (var index = 1; index < sorted.Count; index++)
{
    var previous = sorted[index - 1];
    var current = sorted[index];
    if (current == previous)
        Console.WriteLine(string.Format("duplicated value: {0}", current));
}

Output:

duplicated value: 2
duplicated value: 7

http://rextester.com/SIDG48202

Test if all values in a list are unique

8 Answers8

Linked

Related