5

With a list like so:

int[] numbers = {1,2,2,3,3,4,4,5};

I can remove the duplicates using the Distinct() function so the list will read: 1,2,3,4,5

however, I want the inverse. I want it to remove all of the numbers that are duplicated leaving me with the unique ones.

So the list will read: 1,5.

How would this be done?

user1662290
  • 466
  • 3
  • 10
  • 18

3 Answers3

14

One way would be

var singles = numbers.GroupBy(n => n)
                     .Where(g => g.Count() == 1)
                     .Select(g => g.Key); // add .ToArray() etc as required
Jon
  • 428,835
  • 81
  • 738
  • 806
  • 5
    `!g.Skip(1).Any()` would be *a little* better than `g.Count()==1` in my opinion. – King King Sep 20 '13 at 09:58
  • 2
    @KingKing: I believe that the group implementation contains a cached count of items, so `.Count()` would be best. I *think* I had benchmarked this at some point, but I can't find it right now so I 'm only 99% sure. – Jon Sep 20 '13 at 10:03
  • @KingKing: borrowed your comment in my answer :) At Jon: No, the `GroupBy` is using deferred execution but `Count()` will execute it completely. – Tim Schmelter Sep 20 '13 at 10:04
  • Nice to know about that, I thought it must iterate through all the elements in each group again to get `Count`. Thanks for reply. – King King Sep 20 '13 at 10:05
  • @KingKing: I did find it. [Here's the code](http://pastebin.com/dWpkTb6Q); playing with it makes it clear that `.Count()` is constant time in this context. – Jon Sep 20 '13 at 10:06
  • @TimSchmelter it's free :), however as `Jon` said, if there is some `cache`, it's not something special. – King King Sep 20 '13 at 10:06
  • @TimSchmelter: Please have a look at the example I cooked up. I don't claim it's a bulletproof test by any means, but it certainly proves that `.Count()` is not 100% naive. Do you have more insight on that? – Jon Sep 20 '13 at 10:07
  • @Jon: Good to know. However, it does not prevent the `Count` to execute the query completely even if it contains a myriad of items and you just want to know if it contains only one. But it prevents it to count it more than once. – Tim Schmelter Sep 20 '13 at 10:13
  • @TimSchmelter: As long as we are not going into `IQueryable` territory, wouldn't accessing even a single item from `GroupBy` force the query (the last item in the seq could be part of the first group, so the seq has to be completely iterated over, at which point it seems obvious to take the eager approach and do all the grouping work on the spot) and calculate all the group counts on the way irrespective of what happens next? – Jon Sep 20 '13 at 10:16
  • @Jon: But it can stop if more than one item in the group was found whereas the `Count()` needs to count all groups. Btw, created another test with random values and large collections and the time consumed even increased with every measurement. So i assume there is no _cache_. – Tim Schmelter Sep 20 '13 at 10:25
  • @TimSchmelter: My point is that you cannot even get the keys of the groups unless the whole input is enumerated, so touching `.GroupBy()` in any way would force the whole thing. In which case, why not produce the counts "for free" since you are enumerating everything? In other words, even if you just do `.GroupBy(...).Where(g => true)` the counts should already be there. Anyway, seems like a deeper dive is in order at some point. – Jon Sep 20 '13 at 10:45
9

For what it's worth, an extension that checks if a sequence contains more than N elements:

public static bool CountMoreThan<TSource>(this IEnumerable<TSource> source, int num)
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (num < 0)
        throw new ArgumentException("num must be greater or equal 0", "num");

    ICollection<TSource> collection = source as ICollection<TSource>;
    if (collection != null)
    {
        return collection.Count > num;
    }
    ICollection collection2 = source as ICollection;
    if (collection2 != null)
    {
        return collection2.Count > num;
    }

    int count = 0;
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        while (++count <= num + 1)
            if (!enumerator.MoveNext())
                return false;
    }
    return true;
}

Now it's easy and efficient:

var allUniques = numbers.GroupBy(i => i)
    .Where(group => !group.CountMoreThan(1))
    .Select(group => group.Key).ToList();

DEMO

Or, as commented by @KingKing on Jon's answer:

var allUniques = numbers.GroupBy(i => i)
    .Where(group => !group.Skip(1).Any())
    .Select(group => group.Key).ToList();
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
4
var cleanArray = numbers.GroupBy(x=>x)
  .Where(x=>x.Count() == 1)
  .SelectMany(x=>x)
  .ToArray();
Save
  • 11,450
  • 1
  • 18
  • 23