5

This is in someway related to this (Getting all unique Items in a C# list) question.

The above question is talking about a simple array of values though. I have an object returned from a third party web service:

public class X
{
    public Enum y {get; set;}

}

I have a List of these objects List<x> data;, about 100 records in total but variable. Now I want all the possible values in the list of the property y and I want to bind this do a CheckBoxList.DataSource (in case that makes a difference).

Hows the most efficient way to do this?

I can think of two algorithms:

var data = HashSet<Enum> hashSet = new HashSet<Enum>(xs.Select(s => s.y));
chkBoxList.DataSource = data;

Or

var data = xs.Select(s => s.y).Distinct();
chkBoxList.DataSource = data;

My gut feeling is the HashSet but I'm not 100% sure.

Open to better ideas if anyone has any idea?

Community
  • 1
  • 1
Liam
  • 27,717
  • 28
  • 128
  • 190
  • theres a lot here like do you need change notifications,updates,filtering and so on....if so then bindinglist is an option. – terrybozzio Oct 10 '13 at 13:38
  • @terrybozzio, No, none of that. I just need an IEnumerable that I can use as a DataSource. – Liam Oct 10 '13 at 13:42
  • possible duplicate of [What's better for creating distinct data structures: HashSet or Linq's Distinct()?](http://stackoverflow.com/questions/6298679/whats-better-for-creating-distinct-data-structures-hashset-or-linqs-distinct) – nawfal May 26 '14 at 10:53

2 Answers2

3

If it is a one time operation - use .Distinct. If you are going to add elements again and again - use HashSet

Anarion
  • 2,406
  • 3
  • 28
  • 42
  • 1
    Can you elaborate a little? – Liam Oct 10 '13 at 13:12
  • `HashSet` is a "heavy" object, but it's very cheap to add single item and check for uniqueness at the same time(the structure of hashset performs it). It also stores all the data. While IEnumerable result of `Select` is simply a query, which will be executed every time you access the object. So for single-time useage - use query, for continuous usage - create a `HashSet`. – Anarion Oct 10 '13 at 13:52
  • Heavy object?? I don't understand that? Do you mean where the memory is allocated? they are both reference types so I don't think this is an issue? – Liam Oct 10 '13 at 13:54
  • Creating a HashSet is more expensive then creating a simple List. – Anarion Oct 10 '13 at 13:55
  • You can combine both - `var data = HashSet hashSet = new HashSet(xs.Select(s => s.y).Distinct());` This way, if you want to add more items to `data` you will not have to re-query.... – Anarion Oct 10 '13 at 13:56
2

The HashSet one, since it keeps the objects around after the hashset object has been constructed, and foreach-ing it will not require expensive operations.

On the other hand, the Distinct enumerator will likely be evaluated every time the DataSource is enumerated, and all the work of removing duplicate values will be repeated.

staafl
  • 3,147
  • 1
  • 28
  • 23
  • *keeps the objects around*? Can you explain? – Liam Oct 10 '13 at 13:18
  • 2
    `new HashSet(xs.Select(s => s.y))` - this will create a hashset contains (keeps) all distinct values of `s.y`; every time someone enumerates it, he will just go over the contents of the collection `xs.Select(s => s.y).Distinct()` - this creates an iterator object that knows how to generate the distinct values of `s.y`, but does not store them - every time you enumerate it, it will run through the whole algorithm again – staafl Oct 10 '13 at 13:47
  • This answer is misleading since you can simply call `Distinct().ToList()` to avoid repeated evaluations. In fact this is a very common pattern when working with `IEnumerable`s. Additionally, `Distinct().ToList()` is slightly faster for filtering an existing collection, and arguably more expressive in its intent. See benchmarks at the end of this answer http://stackoverflow.com/a/13515243/1488656 – Livven Jun 08 '16 at 21:39
  • 1
    @Livven, keeping the unique entries in a List is more expressive in its intent? I disagree strongly. A HashSet is inherently about uniqueness and is both more flexible and more explicit than ToList()-ing a Linq query. – staafl Jun 09 '16 at 06:31
  • @staafl Agree to disagree, but that was not the main point of my comment anyway. Whether you prefer `HashSet` or `Distinct()`, not mentioning `ToList()` makes the comparison unfair and trivial. – Livven Jun 09 '16 at 14:06