440

HashSet The C# HashSet data structure was introduced in the .NET Framework 3.5. A full list of the implemented members can be found at the HashSet MSDN page.

  1. Where is it used?
  2. Why would you want to use it?
Hossein Sabziani
  • 1
  • 2
  • 15
  • 20
001
  • 62,807
  • 94
  • 230
  • 350

5 Answers5

634
    1. A HashSet holds a set of objects, but in a way that allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object. Take a look here
  1. HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union, intersection, and symmetric difference. Take a look here

  2. There are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However, that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.

The HashSet class in C# goes for the first approach, thus not preserving the order of elements. It is much faster than a regular List. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So the point is that HashSet is fast.

The only catch of HashSet is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that. Take a look here

lepsch
  • 8,927
  • 5
  • 24
  • 44
kamaci
  • 72,915
  • 69
  • 228
  • 366
  • 13
    Two things, hashset and similar are .NET's, not C#'s. Also HashSet doesnt preserve order. Try adding and removing items from a hash set, you will know if you iterate later on.. – nawfal Nov 20 '12 at 23:56
  • 1
    Isn't a HashSet pretty much the same thing as a Dictionary then? Whats the difference? – csstudent1418 Nov 24 '20 at 12:41
14

A HashSet has an internal structure (hash), where items can be searched and identified quickly. The downside is that iterating through a HashSet (or getting an item by index) is rather slow.

So why would someone want be able to know if an entry already exists in a set?

One situation where a HashSet is useful is in getting distinct values from a list where duplicates may exist. Once an item is added to the HashSet it is quick to determine if the item exists (Contains operator).

Other advantages of the HashSet are the Set operations: IntersectWith, IsSubsetOf, IsSupersetOf, Overlaps, SymmetricExceptWith, UnionWith.

If you are familiar with the object constraint language then you will identify these set operations. You will also see that it is one step closer to an implementation of executable UML.

Stacked
  • 6,892
  • 7
  • 57
  • 73
k rey
  • 611
  • 4
  • 11
  • 22
    Re: downside. No, iterating through a HashSet is perfectly fast. Secondly, it is not possible to get an item by index. In fact, the elements are stored unordered. – Nigel Touch Apr 18 '13 at 14:56
  • 1
    @Nigel Touch. Iterating is fast if you do not care about the index (order in which they were added). However, if you are concerned about the index then the index must be stored with each hash key and thus it can be rather slow because the list must be searched exhaustively to retrieve the correct item. This behavior is very different than a list in which items are indexed by the order in which they are added. – k rey Dec 17 '14 at 16:26
  • It makes sense why it would be fast, because no two hash's are the same. Enabling the query to take advantage of a "short circuit" approach, quickly ruling out certain criteria. – Chef_Code Feb 17 '16 at 05:23
  • `HashSet` does not have a notion of *getting an item by index*. `Hashtable` represents a collection of key/value pairs. – Mustafa Özçetin Jul 17 '23 at 13:07
10

Simply said and without revealing the kitchen secrets: a set in general, is a collection that contains no duplicate elements, and whose elements are in no particular order. So, A HashSet<T> is similar to a generic List<T>, but is optimized for fast lookups (via hashtables, as the name implies) at the cost of losing order.

Stacked
  • 6,892
  • 7
  • 57
  • 73
  • 2
    But can a HashSet store two objects that have the same data, like two Product classes that each has the same properties with the same content? – Johan Herstad Aug 30 '18 at 07:37
  • 1
    I guess we'll never know – Denny Nov 26 '18 at 13:10
  • 1
    @JohanHerstad Assuming the EqualityComparer for your class cares about those properties or you construct the HashSet with an IEqualityComparer that cares about those properties, I don't see why it wouldn't. The [documentation for HashSet](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1?view=netcore-2.2) makes it clear that it relies upon one or the other to determine uniqueness. – Bacon Bits Jul 29 '19 at 12:16
2

From application perspective, if one needs only to avoid duplicates then HashSet is what you are looking for since it's Lookup, Insert and Remove complexities are O(1) - constant. What this means it does not matter how many elements HashSet has it will take same amount of time to check if there's such element or not, plus since you are inserting elements at O(1) too it makes it perfect for this sort of thing.

Community
  • 1
  • 1
Matas Vaitkevicius
  • 58,075
  • 31
  • 238
  • 265
0

HashSet is like a list but does not allow duplicate items to be added. It is used when you want to have a list with unique items.

example:

HashSet<int> HS = new HashSet<int>();
HS.Add(1);
HS.Add(2);
HS.Add(3);
HS.Add(1);
HS.Add(2);

foreach(int item in HS)
    Console.WriteLine(item); 

result:

1    
2    
3
Hossein Sabziani
  • 1
  • 2
  • 15
  • 20