Getting unique items from a list

Question

What is the fastest / most efficient way of getting all the distinct items from a list?

I have a List<string> that possibly has multiple repeating items in it and only want the unique values within the list.

The title of this question is misleading. Selecting unique items is about selecting items that occur just once in the list, against selecting each distinct element,once. Given `["A", "B", "C", "C", "D", "D"]`, unique items would return `["A","B"]`, whereas distinct items would return `["A", "B", "C", "D"]`. — Eduardo Pignatelli, Jun 28 '18 at 11:12
@EduardoPignatelli Quite picky, but the question could be reworded unambiguously. The intent of this question as normally encountered means: "Given a list of values, how do I get a list of those values without duplicating any?" — Suncat2000, Sep 05 '18 at 17:28

LukeH · Answer 1 · 2009-09-07T09:15:17.143

195

You can use the Distinct method to return an IEnumerable<T> of distinct items:

var uniqueItems = yourList.Distinct();

And if you need the sequence of unique items returned as a List<T>, you can add a call to ToList:

var uniqueItemsList = yourList.Distinct().ToList();

edited Sep 07 '09 at 09:15

answered Sep 07 '09 at 09:10

LukeH

263,068
57
365
409

2

The OP was looking for a fast/efficient method. This is not it. Calling `yourList.Distinct().ToList()` requires two full iterations over the enumerable, and additionally is based off `IEqualityComparer`, which is slower than `GetHashCode`. – Noldorin Sep 07 '09 at 09:19
1

Is this faster/more efficient than a HashSet? I don't think so. Not bothering to downvote, though :-) – Vinay Sajip Sep 07 '09 at 09:21
@Noldorin, @Vinay: *If* the OP needs the distinct items returned as a `List` then they'll need to call `ToList`, regardless of whether they use `Distinct` or construct a `HashSet`. Having said that, you're right that a `HashSet` will probably have better performance than `Distinct` in most circumstances. – LukeH Sep 07 '09 at 10:29
22

@Noldorin: I know this is old, but it shows up easily on Google and you're wrong (at least, as of .NET 4 - I haven't checked in older versions). yourList.Distinct().ToList() performs one enumeration, new HashSet(yourList).ToList() performs two. And the implementations of HashSet and Distinct's internal Set class are almost identical. They both use GetHashCode, and they both use IEqualityComparers (which they have to, as equal hashcodes don't (in general) guarantee equal objects). – reavowed May 23 '11 at 12:02
3

@Noldorin: How would a performance benchmark make any argument for or against what I said? You can verify what I said by pulling up System.Linq.Enumerable.DistinctIterator and System.Linq.Set in Reflector (or other .NET decompiler), independent of relative performance. – reavowed May 24 '11 at 17:07
1

@IainM: Sorry, you're right. I was reading into your post and taking the implication that they are similar in speed. I am still very interested if they actually are. I suspect the difference is still there, though it has possibly gone down since .NET 4.0. – Noldorin May 25 '11 at 01:37
Note that `Distinct` is an extension method and lives in `System.Linq` namespace. `public static IEnumerable Distinct (this IEnumerable source)` – guneysus Mar 20 '17 at 06:46

score 165 · Accepted Answer · edited Sep 07 '09 at 09:20

165

Use a HashSet<T>. For example:

var items = "A B A D A C".Split(' ');
var unique_items = new HashSet<string>(items);
foreach (string s in unique_items)
    Console.WriteLine(s);

prints

A
B
D
C

edited Sep 07 '09 at 09:20

Noldorin

144,213
56
264
302

answered Sep 07 '09 at 09:10

Vinay Sajip

95,872
14
179
191

3

Must agree; others solve the problem, yours solves the cause :) – Noon Silk Sep 07 '09 at 09:12
13

A `HashSet` won't maintain any ordering, which may or may not be an issue for the OP. – LukeH Sep 07 '09 at 09:13
thanks guys, I don't require the items to be ordered. This works great. – domgreen Sep 07 '09 at 09:47

score 7 · Answer 3 · answered Sep 07 '09 at 09:10

7

You can use Distinct extension method from LINQ

answered Sep 07 '09 at 09:10

aku

122,288
32
173
203

score 5 · Answer 4 · edited Sep 07 '09 at 09:49

5

Apart from the Distinct extension method of LINQ, you could use a HashSet<T> object that you initialise with your collection. This is most likely more efficient than the LINQ way, since it uses hash codes (GetHashCode) rather than an IEqualityComparer).

In fact, if it's appropiate for your situation, I would just use a HashSet for storing the items in the first place.

edited Sep 07 '09 at 09:49

Vinko Vrsalovic

330,807
53
334
373

answered Sep 07 '09 at 09:12

Noldorin

144,213
56
264
302

1

A `HashSet` won't maintain any ordering, which may or may not be an issue for the OP. – LukeH Sep 07 '09 at 09:14
@Luke: Even so, ordering would have no meaning after calling `Distinct`... – Noldorin Sep 07 '09 at 09:16
@Luke: The question asks about fastest/most efficient, and doesn't require ordering to be maintained. – Vinay Sajip Sep 07 '09 at 09:20
@Noldorin: Why not? `Distinct` should/does iterate the list in order (although I'm not sure if that's actually guaranteed in any spec). – LukeH Sep 07 '09 at 09:22
@Luke: Oh, I was thinking of indexing really. And anyway, efficiency was mentioned in the OP, while order wasn't (though that's open question) - `HashSet` is the way to go if you want good performance. – Noldorin Sep 07 '09 at 09:26

score 5 · Answer 5 · answered Dec 11 '10 at 13:41

5

In .Net 2.0 I`m pretty sure about this solution:

public IEnumerable<T> Distinct<T>(IEnumerable<T> source)
{
     List<T> uniques = new List<T>();
     foreach (T item in source)
     {
         if (!uniques.Contains(item)) uniques.Add(item);
     }
     return uniques;
}

answered Dec 11 '10 at 13:41

Murilo Beltrame

81
1
1

3

*Please* use a collection with faster random access than List, such as a Dictionary or HashSet. Because currently, if `source` contains 100,000 items with many duplicates, then in every one of the 100,000 iterations you will be scanning a list on the order of 100,000 items, meaning you are scanning on the order of `100,000 * 100,000` items. Quadratic time complexity can become quite slow. – Timo Oct 13 '15 at 07:32

Getting unique items from a list

5 Answers5

Linked

Related