111

I have a small list of bytes and I want to test that they're all different values. For instance, I have this:

List<byte> theList = new List<byte> { 1,4,3,6,1 };

What's the best way to check if all values are distinct or not?

frenchie
  • 51,731
  • 109
  • 304
  • 510

8 Answers8

206
bool isUnique = theList.Distinct().Count() == theList.Count();
juergen d
  • 201,996
  • 37
  • 293
  • 362
  • 1
    Just curious: what are the space and time requirements of this? – dtb Aug 18 '13 at 21:44
  • 12
    @dtb [should be about O(N)](http://stackoverflow.com/questions/2799427/what-guarantees-are-there-on-the-run-time-complexity-big-o-of-linq-methods). Of course, considering this is a "small list", it will be lightning-fast with almost any algorithm. IMO this wins on readability and conciseness, and since speed's not an issue, that makes it perfect. – Tim S. Aug 18 '13 at 22:04
  • 3
    This is leff efficient than it could be – Jodrell Feb 02 '17 at 09:50
  • 1
    @Tim Schmelter answer was 4x faster while using the HashSet. This approach would be good for everyday usage, but for larger set like in millions, HashSet should be used. – Chandraprakash Oct 17 '21 at 11:25
93

Here's another approach which is more efficient than Enumerable.Distinct + Enumerable.Count (all the more if the sequence is not a collection type). It uses a HashSet<T> which eliminates duplicates, is very efficient in lookups and has a count-property:

var distinctBytes = new HashSet<byte>(theList);
bool allDifferent = distinctBytes.Count == theList.Count;

or another - more subtle and efficient - approach:

var diffChecker = new HashSet<byte>();
bool allDifferent = theList.All(diffChecker.Add);

HashSet<T>.Add returns false if the element could not be added since it was already in the HashSet. Enumerable.All stops on the first "false".

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 1
    so simple and obvious, why didn't I think about it first :) I used this one-liner in unit test to confirm 10 million elements generated by my awesome code are really unique `Assert.IsTrue(samples.Add(AwesomeClass.GetUnique()));`. They were and are :) +1 for you Tim :) – grapkulec Apr 04 '15 at 14:34
  • 1
    i have tried your answer on this question but it is not working sir:http://stackoverflow.com/questions/34941162/how-to-break-from-the-loop-when-adding-values-to-the-list-object-which-is-distin/34941302?noredirect=1#34941302 – I Love Stackoverflow Jan 22 '16 at 09:22
  • Should be this: `bool allDifferent = theList.All(s => diffChecker.Add(s))` – mike nelson Nov 28 '17 at 23:05
  • 4
    No, not needed. In this case you can pass the delegate directly – Tim Schmelter Nov 28 '17 at 23:09
  • If you use [MoreLinq](https://github.com/morelinq/MoreLINQ)'s `ToHashSet` extension method, the first version of this is even easier. – Arithmomaniac Mar 27 '18 at 16:23
  • Note that make this work with a (custom) none generic `IEqualityComparer` you have to rely on `Hashtable`. As `Hashtable.Add` is a `void`, you have to keep a counter. Than you're done, if adding a new item of your list did not increase the size of your `Hashtable`. With a generic `IEqualityComparer` @Rango's solution is extremely elegant. – Corniel Nobel Apr 05 '19 at 09:02
  • I tried all three approaches in a benachmark and realized, that the one with `new HashSet(theList)` seems to be the fastest, even with very small lists with only 25 entries. – André Reichelt Aug 16 '19 at 12:12
  • @TimSchmelter Here's my test scenario: https://dotnetfiddle.net/n9mlRX Feel free to fork and modify it and share your results. **PS:** In my project, the lists will most likely be unique. When you comment-out the shuffle section in my code, you can see, that the middle version will always outperform the last one. – André Reichelt Aug 16 '19 at 13:11
  • 2
    @AndréReichelt - I just opened your code and the third scenario (`List.All(HashSet.Add)`) seems to be much faster than the other two in almost all cases – Kyle Delaney Feb 21 '20 at 18:39
  • @TimSchmelter Wow, passing the delegate directly is so much cleaner! Thanks for the info! – Chandraprakash Oct 17 '21 at 11:26
12

Okay, here is the most efficient method I can think of using standard .Net

using System;
using System.Collections.Generic;

public static class Extension
{
    public static bool HasDuplicate<T>(
        this IEnumerable<T> source,
        IEqualityComparer<T> comparer,
        out T firstDuplicate)
    {
        ArgumentNullException.ThrowIfNull(source);
        ArgumentNullException.ThrowIfNull(comparer);
        
        (bool result, firstDuplicate) = HasDuplicateImplementation(source, comparer);
        
        return result;
    }
    
    public static bool HasDuplicate<T>(
        this IEnumerable<T> source,
        out T firstDuplicate)
    {
        ArgumentNullException.ThrowIfNull(source);
        var comparer = EqualityComparer<T>.Default;
        
        (bool result, firstDuplicate) = HasDuplicateImplementation(source, comparer);
        
        return result;
    }
    
    private static (bool, T) HasDuplicateImplementation<T>(
        IEnumerable<T> source,
        IEqualityComparer<T> comparer)
    {
        var checkBuffer = new HashSet<T>(comparer);
        foreach (var t in source)
        {
            if (!checkBuffer.Add(t))
            {
                return (true, t);
            }
        }
        
        return (false, default);
    }
}

essentially, what is the point of enumerating the whole sequence twice if all you want to do is find the first duplicate.

Jodrell
  • 34,946
  • 5
  • 87
  • 124
  • Nice adding a duplicate value out return, quite useful for validation – Pac0 Sep 28 '17 at 10:28
  • I have tested 3 solutions here and this is indeed the most efficient on this page. A few typos in there though (eg `sequence` should be `source`). But works great once these are fixed – mike nelson Nov 28 '17 at 23:07
  • @mikenelson, that should be better – Jodrell Nov 29 '17 at 08:42
  • 2
    For readability, I think it should be `if (!checkBuffer.Add(t)) { firstDuplicate = t; return true }` in the loop. – tia Jun 27 '18 at 13:19
10

The similar logic to Distinct using GroupBy:

var isUnique = theList.GroupBy(i => i).Count() == theList.Count;
  • 4
    This is useful if you want to check uniqueness with respect to a property `theList.GroupBy(o => o.SomeProperty).Count() == theList.Count;` while Distinct() does not allow that. – Rev Sep 24 '20 at 13:40
2

I check if an IEnumerable (aray, list, etc ) is unique like this :

var isUnique = someObjectsEnum.GroupBy(o => o.SomeProperty).Max(g => g.Count()) == 1;
Namig Hajiyev
  • 1,117
  • 15
  • 16
1

One can also do: Use Hashset

var uniqueIds = new HashSet<long>(originalList.Select(item => item.Id));

            if (uniqueIds.Count != originalList.Count)
            {
            }
Gauravsa
  • 6,330
  • 2
  • 21
  • 30
0

There are many solutions.

And no doubt more beautiful ones with the usage of LINQ as "juergen d" and "Tim Schmelter" mentioned.

But, if you bare "Complexity" and speed, the best solution will be to implement it by yourself. One of the solution will be, to create an array of N size (for byte it's 256). And loop the array, and on every iteration will test the matching number index if the value is 1 if it does, that means i already increment the array index and therefore the array isn't distinct otherwise i will increment the array cell and continue checking.

Orel Eraki
  • 11,940
  • 3
  • 28
  • 36
  • 2
    you can use a bit vector with 256 bits = 32 byte = 8 integers. But your Big O = O(n) still will be the same as using a Hashet proposed in the other answer. – BrokenGlass Aug 18 '13 at 21:55
  • This is O(n) so maybe fastest, (test it). Would checking counts as you go or at the end be the quickest? I suspect that at the end will improve worst case, but as you go may improve average and best case). If there is no duplicates this will be worst case performance. Also for bigger data types this will not work well, for a 16bit type you would have to use 64k of count, well 64k bit (8k byte), but for anything bigger the memory usage will start to get silly. However I like this answer for 8bit values. – ctrl-alt-delor Aug 18 '13 at 21:56
  • 1
    @TamusJRoyce if you want to store 4294967296 possibilities, you need 4GB not 42MB (or 512MB of you use bit masking) – tigrou Aug 01 '19 at 13:34
  • Not sure what I was thinking. "Allocate 42MB+ of memory to hold all 4294967296 possibilities. And use simple bucket counters. Or even use bit masking xor and check if any bit is changed from true to false. 42MB+ / 8 = 5MB+ The expense seems too great with today's hardware. But one day this may hold merit." isn't really a relevant comment. Hashset would be best. If you are dealing with extremely large arrays, you expect extremely large piece of memory. But in such a strange edge case, a heristic with a CRC algorithm would be better. Map it to a polynomial. If close, evaluate. Thank you @tigrou! – TamusJRoyce Aug 01 '19 at 17:12
0

And another solution, if you want to find duplicated values.

var values = new [] { 9, 7, 2, 6, 7, 3, 8, 2 };

var sorted = values.ToList();
sorted.Sort();
for (var index = 1; index < sorted.Count; index++)
{
    var previous = sorted[index - 1];
    var current = sorted[index];
    if (current == previous)
        Console.WriteLine(string.Format("duplicated value: {0}", current));
}

Output:

duplicated value: 2
duplicated value: 7

http://rextester.com/SIDG48202

Kevin Struillou
  • 856
  • 10
  • 19