212

I have the following class.

class Test{
    public HashSet<string> Data = new HashSet<string>();
}

I need to change the field "Data" from different threads, so I would like some opinions on my current thread-safe implementation.

class Test{
    public HashSet<string> Data = new HashSet<string>();

    public void Add(string Val){
            lock(Data) Data.Add(Val);
    }

    public void Remove(string Val){
            lock(Data) Data.Remove(Val);
    }
}

Is there a better solution, to go directly to field and protect it from concurrent access by multiple threads?

Andrew
  • 3,874
  • 5
  • 39
  • 67
kukab
  • 2,133
  • 2
  • 12
  • 4
  • How about using one of the collections under `System.Collections.Concurrent` – I4V Sep 20 '13 at 18:00
  • 13
    Of course, make it private. – Hans Passant Sep 20 '13 at 18:01
  • 3
    From a concurrency perspective, I don't see much wrong with what you've done other than the Data field being public! You could get better read performance using ReaderWriterLockSlim if that is a concern. http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim.aspx – Allan Elder Sep 20 '13 at 18:02
  • @AllanElder `ReaderWriterLock` will be helpful(efficient) when multiple readers and a single writer. We've to know whether is this the case for OP – Sriram Sakthivel Sep 20 '13 at 18:08
  • 5
    The current implementation is not really 'concurrent':) It's just thread-safe. – undefined Sep 23 '13 at 16:00
  • possible duplicate of [How to implement ConcurrentHashSet in .Net](http://stackoverflow.com/questions/4306936/how-to-implement-concurrenthashset-in-net) – Ruben Bartelink Mar 06 '14 at 16:29

8 Answers8

238

Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.

ConcurrentDictionary (recommended)

This first one is to use the class ConcurrentDictionary<TKey, TValue> in the namespace System.Collections.Concurrent. In the case, the value is pointless, so we can use a simple byte (1 byte in memory).

private ConcurrentDictionary<string, byte> _data;

This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T> except key and value are different objects.

Source: Social MSDN

Self-implementation

Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: How to implement ConcurrentHashSet in .Net

The only drawback of this solution is that the type HashSet<T> doesn't officially concurrent access, even for reading operations.

I quote the code of the linked post (originally written by Ben Mosher).

using System;
using System.Collections.Generic;
using System.Threading;

namespace BlahBlah.Utilities
{
    public class ConcurrentHashSet<T> : IDisposable
    {
        private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
        private readonly HashSet<T> _hashSet = new HashSet<T>();

        #region Implementation of ICollection<T> ...ish
        public bool Add(T item)
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Add(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public void Clear()
        {
            _lock.EnterWriteLock();
            try
            {
                _hashSet.Clear();
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public bool Contains(T item)
        {
            _lock.EnterReadLock();
            try
            {
                return _hashSet.Contains(item);
            }
            finally
            {
                if (_lock.IsReadLockHeld) _lock.ExitReadLock();
            }
        }

        public bool Remove(T item)
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Remove(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public int Count
        {
            get
            {
                _lock.EnterReadLock();
                try
                {
                    return _hashSet.Count;
                }
                finally
                {
                    if (_lock.IsReadLockHeld) _lock.ExitReadLock();
                }
            }
        }
        #endregion

        #region Dispose
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }
        protected virtual void Dispose(bool disposing)
        {
            if (disposing)
                if (_lock != null)
                    _lock.Dispose();
        }
        ~ConcurrentHashSet()
        {
            Dispose(false);
        }
        #endregion
    }
}

EDIT: Move the entrance lock methods ouside the try blocks, as they could throw an exception and execute the instructions contained in the finally blocks.

ConcurrentBag (inadvisable)

The usage of ConcurrentBag<T> is not advised, since this type only allows inserting a given element and removing a random element in a thread-safe manner. This class is designed for facilitating producer-consumer scenarios, which is not what OP aims for (more explanations here).

The other operations (e.g., provided by the extension methods) do not support concurrent usage. MSDN docs warn: "All public and protected members of ConcurrentBag are thread-safe and may be used concurrently from multiple threads. However, members accessed through one of the interfaces the ConcurrentBag implements, including extension methods, are not guaranteed to be thread safe and may need to be synchronized by the caller."

Jämes
  • 6,945
  • 4
  • 40
  • 56
  • 9
    a dictionary with junk values is a list – Ralf Sep 20 '13 at 18:09
  • 64
    @Ralf Well, it's a set, not a list, as it's unordered. – Servy Sep 20 '13 at 18:18
  • @Servy You are right, I updated my post. I mixed up the `HashSet` and `HashTable` data types. – Jämes Sep 20 '13 at 18:24
  • 1
    Better than using a `byte` with `ConcurrentDictionary` is to create a zero fields struct (e.g. `public struct VOID { }`), and use it as value, since you cannot use `Void` for that purpose. – Miguel Angelo May 09 '14 at 22:29
  • 1
    @ZenLulz I think there's a couple things wrong with your solution. 1) entering the lock should be done outside the `try`, since acquiring the lock could fail, but a lock is held by someone else, so you don't want to exit their lock. 2) There's nothing on MSDN indicating that HashSet is thread-safe, even for reading. This means that officially, HashSet doesn't support concurrent readers, but this is allowed by your solution. Although ugly, I think using `ConcurrentDictionary` is probably the way to go. – RobSiklos May 13 '14 at 12:55
  • 1
    @MiguelAngelo: Nice try, but .NET forces empty structs to have a size of 1 byte. :) Using a decompiler, you can see this attribute above empty structs `[StructLayout(LayoutKind.Sequential, Size=1)]`. @RobSiklos: Nicely spotted. I edited my answer accordingly. You are also right on the fact `HashSet` is not officially thread-safe for reading operations. I also think `ConcurrentDictionary` is the way to go. – Jämes May 13 '14 at 21:21
  • How sad! May be the compiler will allow this some day... in this case, it would make sense. If it only allowed the usage of `void`. =\ – Miguel Angelo May 14 '14 at 13:45
  • 17
    According to MSDN's rather short document on ["Collections and Synchronization (Thread Safety)"](http://msdn.microsoft.com/en-us/library/573ths2x(v=vs.71).aspx), classes in the System.Collections and related namespaces can be read by multiple threads safely. This means HashSet can be safely read by multiple threads. – Hank Schultz Oct 01 '14 at 17:39
  • 1
    Why don't you take a `ConcurrentDictionary` and always set the value to `null`? – Oliver Oct 28 '14 at 10:23
  • 9
    @Oliver, a reference uses much more memory per entry, even if it is a `null` reference (the reference needs 4 bytes in a 32-bit runtime and 8 bytes in a 64-bit runtime). Therefore, using a `byte`, an empty struct, or similar may reduce the memory footprint (or it may not if the runtime aligns the data on native memory boundaries for faster access). – Lucero Nov 08 '14 at 01:12
  • You can try with a ConcurrentBag – Bargitta Aug 18 '15 at 05:13
  • 13
    The Self-implementation is not a ConcurrentHashSet but rather a ThreadSafeHashSet. There is a big difference between those 2 and that is why Micorosft abandoned SynchronizedCollections (people got it wrong). In order to be "Concurrent" operations like GetOrAdd etc should be implemented (like the dictionary) or else concurrency cannot be ensured without additional locking. But if you need additional locking outside the class then why don't you use a simple HashSet right from the start? – George Mavritsakis Apr 11 '16 at 09:36
  • @GeorgeMavritsakis semantics. according to what concurrency is defined as: "concurrency control ensures that correct results for concurrent operations are generated", the above (assuming it's implemented properly) *does* in fact support consistent results when concurrent access is made (such as from multiple threads). *How* concurrency is supported - in this case using locks - is just an implementation detail. I don't know what point you're trying to make. Perhaps you could give a concrete e.g of how this code wouldn't support concurrent access (assuming it's void of bugs related to locking) – AaronHS Apr 27 '16 at 05:06
  • @GeorgeMavritsakis also regarding: "In order to be "Concurrent" operations like GetOrAdd etc should be implemented" - I'm yet to hear of any formal definition of concurrency that mandates a GetOrAdd. Concurrency only mandates that safe and consistent *concurrent* access can be made. – AaronHS Apr 27 '16 at 05:10
  • 4
    @AaronHS the main difference between thread safe and concurrency is that thread safe "guarantees safe execution by multiple threads at the same time" while concurrency includes the meaning of a transaction. It all comes to what you think an operation is, how atomic it is. This custom implementation is not concurrent because 2 threads cannot add at the same time the same value. In order to do so additional locks must be used. You cannot check if a value exist and add it if it does not exist without additional locks. Concurrent collections do not need additional locks. – George Mavritsakis Apr 27 '16 at 08:03
  • 1
    @GeorgeMavritsakis True, yet totally missing the point. You're talking implementation detail. It comes down to what level of abstraction we are dealing in. In this case, the OP cares about concurrent modifications to a *some collection*. What, how, or if, *true concurrency* is being implemented is not part of the discussion at that level of abstraction. If he is safely able to modify the collection, and *know* that other threads may be doing the same, the collection *supports* concurrency by definition, which is all we care about here... cont'd – AaronHS Apr 28 '16 at 02:15
  • .. (notice I said "collection supports concurrency"). Yes it is true that internally there is no actual concurrency happening. But again, this is not really relevant, and whether it's concurrent is just a matter of perspective (or better described as what level of abstraction we are dealing in). We all understand the difference between concurrency and non-concurrency at a fundamental level, and I'll grant you that concurrency isn't *actually* taking place here. But again, I don't see that there is any *practical* difference worth pointing out here. – AaronHS Apr 28 '16 at 02:20
  • 5
    @AaronHS The important difference @GeorgeMavritsakis is pointing out is that without `GetOrAdd`, the result of `Contains` in the above code is non-deterministic which thus in fact, renders that function advisory at best or useless at worst. An API that can only "hint" is inconvenient for callers--and an error if misunderstood. Your notion of "semantics" seems to focus on sequential *correctness* which is indeed imperative, but certainly some people consider tangible performance effects semantic as well. The approach shown here seriously limits the caller's options/capabilities in that regime. – Glenn Slayden Oct 28 '17 at 03:18
  • 3
    What is the point of having a `GetOrAdd` on a `HashSet` to begin with? `GetOrAdd` is meant for key/value pairs. Here we only have a value, being its own key. Due to this `Add` is enough to cover `GetOrAdd` features. Either value was not there and it has added it, or it was there but you do not need to get it since you already have it in the first place. – Frédéric Mar 20 '18 at 17:01
  • 3
    @HankSchultz: Sadly, this is no longer the case. Currently, [Collections and Synchronization (Thread Safety)](https://msdn.microsoft.com/en-us/library/573ths2x.aspx) says, "The following text applies only to programs that must target versions of the .NET Framework before version 4." I doubt anything has changed, but it's no longer _documented_ as being safe. Note that some collections (e.g., dictionary) are individually documented as being safe in the face of concurrent reads. However, hashset is not. – Brian May 11 '18 at 20:13
  • 1
    @Brian, your link is an old preview documentation which has not made it live, and so is irrelevant. The [up-to-date documentation](https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/index) states we can safely read concurrently from any collection in `System.Collections.Generic` namespace. `If you are only reading from a shared collection, then you can use the classes in the System.Collections.Generic namespace.` – Frédéric Sep 03 '19 at 22:41
  • 3
    the `concurrentBag` is not an option as it does not provide the methods similar to hashset. e.g. `Remove(T item)`. – Eklavyaa Sep 12 '19 at 08:36
  • ConcurrentBag works here, it has a Contains() method to stop you from having duplicate values. – Jono Jul 28 '21 at 22:56
  • 6
    @Jono To be precise, the type `ConcurrentBag` does not implement a `Contains` method. Instead, the extension method `Contains` is available by the type `Enumerable`, because `ConcurrentBag` implements `IEnumerable`. Therefore, when you step into the implementation of `Contains`, you can see the collection is _naively_ iterated to find the searched value. This implies that the complexity of the search for `ConcurrentBag` is O(n), whereas `ConcurrentDictionary` and `HashSet` have dedicated search algorithms based on their data structure, which is generally amortized to O(1). – Jämes Jul 29 '21 at 09:52
  • 2
    @Jono using the LINQ `Contains` is erroneous. [`ConcurrentBag` - Thread Safety](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1#thread-safety): *"However, members accessed through one of the interfaces the `ConcurrentBag` implements, including extension methods, are not guaranteed to be thread safe and may need to be synchronized by the caller."* The `ConcurrentBag` shouldn't even mentioned as an option. It makes this answer misleading. – Theodor Zoulias Nov 12 '22 at 15:20
  • @TheodorZoulias Nicely caught! I have modified the answer to warn of the usage of `ConcurrentBag` in other ways than just inserting/removing elements. This collection is listed in this answer since OP requests a thread-safe collection for insertion and removal, which is what `ConcurrentBag` offers, with the trade-off of accepting duplicate entries. – Jämes Nov 13 '22 at 22:05
  • 1
    Jämes the `ConcurrentBag` does not support removal by value, which what the OP has asked for. It only supports removal of a random value. This is practically useless, unless you are dealing with a mixed producer-consumer scenario, which is what this class was made for. Mixed producer-consumer scenarios (each worker thread is both a producer and a consumer) are extremely rare. Most C# programmers, even those who write frequently multithreaded code, will not find a good use for this class in their whole life. More of my thoughts [here](https://stackoverflow.com/a/64823123/11178549). – Theodor Zoulias Nov 13 '22 at 22:37
  • 1
    @TheodorZoulias You are correct, this type does not make sense in the context of data removal. I have marked it as **inadvisable**, to prevent further usage of it. Thanks! – Jämes Nov 13 '22 at 22:55
60

Instead of wrapping a ConcurrentDictionary or locking over a HashSet I created an actual ConcurrentHashSet based on ConcurrentDictionary.

This implementation supports basic operations per item without HashSet's set operations as they make less sense in concurrent scenarios IMO:

var concurrentHashSet = new ConcurrentHashSet<string>(
    new[]
    {
        "hamster",
        "HAMster",
        "bar",
    },
    StringComparer.OrdinalIgnoreCase);

concurrentHashSet.TryRemove("foo");

if (concurrentHashSet.Contains("BAR"))
{
    Console.WriteLine(concurrentHashSet.Count);
}

Output: 2

You can get it from NuGet here and see the source on GitHub here.

i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • 3
    This should be the accepted answer, great implementation – smirkingman Jun 20 '17 at 18:50
  • Shouldn't **Add** be renamed to **TryAdd** so that it's consistent with **ConcurrentDictionary**? – Neo Oct 24 '17 at 11:05
  • 10
    @Neo No... because it's intentionally using **HashSet** semantics, where you call __Add__ and it returns a boolean indicating whether the item was added (true), or it already existed (false). https://msdn.microsoft.com/en-us/library/bb353005(v=vs.110).aspx – G-Mac Dec 07 '17 at 21:45
  • Shouldn't it implement `ISet` interface bo actually match `HashSet` semantics? – Jakoss Sep 06 '18 at 13:06
  • 2
    @Nekromancer as I said in the answer, I don't think it makes sense to provide these set methods in a concurrent implementation. `Overlaps` for example would either need to lock the instance throughout its run, or provide an answer that may already be wrong. Both options are bad IMO (and can be added externally by consumers). – i3arnon Sep 06 '18 at 16:51
  • A method that is probably missing (although it's rarely needed) is the [`HashSet.TryGetValue`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1.trygetvalue) method. [Here](https://stackoverflow.com/questions/61085397/get-keyvaluepair-by-key-from-a-concurrentdictionary-in-o1-time) is a question where a similar feature is requested. – Theodor Zoulias Apr 07 '20 at 18:51
30

Since noone else mentioned it, I will offer an alternative approach that may or may not be appropriate for your particular purpose:

Microsoft Immutable Collections

From a blog post by the MS team behind:

While creating and running concurrently is easier than ever, one of the fundamental problems still exists: mutable shared state. Reading from multiple threads is typically very easy, but once the state needs to be updated, it gets a lot harder, especially in designs that require locking.

An alternative to locking is making use of immutable state. Immutable data structures are guaranteed to never change and can thus be passed freely between different threads without worrying about stepping on somebody else’s toes.

This design creates a new problem though: How do you manage changes in state without copying the entire state each time? This is especially tricky when collections are involved.

This is where immutable collections come in.

These collections include ImmutableHashSet<T> and ImmutableList<T>.

Performance

Since the immutable collections use tree data structures underneath to enable structural sharing, their performance characteristics are different from mutable collections. When comparing to a locking mutable collection, the results will depend on lock contention and access patterns. However, taken from another blog post about the immutable collections:

Q: I’ve heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important?

A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.

In other words, in many cases the difference won't be noticeable and you should go with the simpler choice - which for concurrent sets would be to use ImmutableHashSet<T>, since you don't have an existing locking mutable implementation! :-)

Community
  • 1
  • 1
Søren Boisen
  • 1,669
  • 22
  • 41
  • 4
    `ImmutableHashSet` doesn't help much if your intent is to update the shared state from multiple threads or am I missing something here? – tugberk May 26 '17 at 11:43
  • 10
    @tugberk Yes and no. Since the set is immutable, you will have to update the reference to it, which the collection itself does not help you with. The good news is that you have reduced the complex problem of updating a shared data structure from multiple threads to the much simpler problem of updating a shared reference. The library provides you with the [ImmutableInterlocked.Update](https://msdn.microsoft.com/en-us/library/mt806088(v=vs.111).aspx) method to help you with that. – Søren Boisen May 30 '17 at 08:43
  • 1
    @SørenBoisenjust read about immutable collections and tried to figure out how to use them thread safe. `ImmutableInterlocked.Update` seems to be the missing link. Thank you! – xneg Sep 28 '18 at 08:45
4

The tricky part about making an ISet<T> concurrent is that the set methods (union, intersection, difference) are iterative in nature. At the very least you have to iterate over all n members of one of the sets involved in the operation, while locking both sets.

You lose the advantages of a ConcurrentDictionary<T,byte> when you have to lock the entire set during iteration. Without locking, these operations are not thread safe.

Given the added overhead of ConcurrentDictionary<T,byte>, it's probably wiser just to use the lighter weight HashSet<T> and just surround everything in locks.

If you don't need the set operations, use ConcurrentDictionary<T,byte> and just use default(byte) as the value when you are adding keys.

pugby
  • 41
  • 3
3

The solutions based on ConcurrentDictionary<TKey, TValue> are usually well scalable; however, if you need to access the Count, Keys or Values properties or you iterate through the items it becomes worse than a single locking collection. It's because ConcurrentDictionary uses a group of locks (by default their amount depends on the number of CPU cores) and accessing these members cause that all of the locks are acquired so they have the worse performance the more cores your CPU has.

Another answer suggests using immutable collections. Though they are thread-safe, they perform well only if you rarely add new items to them (which always creates a new instance, though attempts to inherit as many nodes from the previous instance as possible), but even in this case they usually have poorer performance.

I ended up with another solution (that I applied also to my ThreadSafeHashSet<T> later on): as opposed to ConcurrentDictionary, I use only a single lock, but only temporarily: from time to time the new items are moved to a completely lock-free storage where even removing and re-adding the same keys become lock-free, hence very fast. No timers are used to perform these merges. A merge to the lock-free storage occurs only when the locking storage has to be accessed and it was created "long enough" time ago, which is configurable.

Note: See the performance comparison table at the Remarks section of the ThreadSafeDictionary<TKey TValue> class (it has the same characteristics as ThreadSafeHashSet<T>) to see whether it's a good choice for your needs. Here you can find the source for the performance tests if you you want to execute them by yourself.

The source is available here and you can also download it as a NuGet package.

György Kőszeg
  • 17,093
  • 6
  • 37
  • 65
2

I prefer complete solutions so i did this: Mind you my Count is implemented in a different fashion because i don't see why one should be forbidden to read the hashset while attempting to count its values.

@Zen, Thanks for getting it started.

[DebuggerDisplay("Count = {Count}")]
[Serializable]
public class ConcurrentHashSet<T> : ICollection<T>, ISet<T>, ISerializable, IDeserializationCallback
{
    private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);

    private readonly HashSet<T> _hashSet = new HashSet<T>();

    public ConcurrentHashSet()
    {
    }

    public ConcurrentHashSet(IEqualityComparer<T> comparer)
    {
        _hashSet = new HashSet<T>(comparer);
    }

    public ConcurrentHashSet(IEnumerable<T> collection)
    {
        _hashSet = new HashSet<T>(collection);
    }

    public ConcurrentHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer)
    {
        _hashSet = new HashSet<T>(collection, comparer);
    }

    protected ConcurrentHashSet(SerializationInfo info, StreamingContext context)
    {
        _hashSet = new HashSet<T>();

        // not sure about this one really...
        var iSerializable = _hashSet as ISerializable;
        iSerializable.GetObjectData(info, context);
    }

    #region Dispose

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
            if (_lock != null)
                _lock.Dispose();
    }

    public IEnumerator<T> GetEnumerator()
    {
        return _hashSet.GetEnumerator();
    }

    ~ConcurrentHashSet()
    {
        Dispose(false);
    }

    public void OnDeserialization(object sender)
    {
        _hashSet.OnDeserialization(sender);
    }

    public void GetObjectData(SerializationInfo info, StreamingContext context)
    {
        _hashSet.GetObjectData(info, context);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    #endregion

    public void Add(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.Add(item);
        }
        finally
        {
            if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public void UnionWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        _lock.EnterReadLock();
        try
        {
            _hashSet.UnionWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            if (_lock.IsReadLockHeld) _lock.ExitReadLock();
        }
    }

    public void IntersectWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        _lock.EnterReadLock();
        try
        {
            _hashSet.IntersectWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            if (_lock.IsReadLockHeld) _lock.ExitReadLock();
        }
    }

    public void ExceptWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        _lock.EnterReadLock();
        try
        {
            _hashSet.ExceptWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            if (_lock.IsReadLockHeld) _lock.ExitReadLock();
        }
    }

    public void SymmetricExceptWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.SymmetricExceptWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsSubsetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsSubsetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsSupersetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsSupersetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsProperSupersetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsProperSupersetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsProperSubsetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsProperSubsetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool Overlaps(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Overlaps(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool SetEquals(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.SetEquals(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    bool ISet<T>.Add(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Add(item);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public void Clear()
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.Clear();
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool Contains(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Contains(item);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public void CopyTo(T[] array, int arrayIndex)
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.CopyTo(array, arrayIndex);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool Remove(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Remove(item);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public int Count
    {
        get
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Count;
            }
            finally
            {
                if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }

        }
    }

    public bool IsReadOnly
    {
        get { return false; }
    }
}
Dbl
  • 5,634
  • 3
  • 41
  • 66
  • The lock gets disposed ... but what about the inner hashset, when is its memory released? – David Rettenbacher Nov 10 '14 at 17:26
  • 1
    @Warappa it's released upon garbage collection. The only time i manually null things and clear their entire presence within a class is when the subjects contain events and thus MAY leak memory (like when you would use ObservableCollection and its changed event). I am open for suggestions if you can add knowledge to my understanding regarding the subject. I've spent a couple of days researching on garbage collection too and am always curious about new information – Dbl Nov 11 '14 at 17:48
  • @AndreasMüller good answer, however i am wondering why you are using '_lock.EnterWriteLock();'followed by '_lock.EnterReadLock();' in some methods like 'IntersectWith' I think there is no need for the read look here as write lock will prevent any reading(s) when enter by default. – Jalal Said Apr 16 '15 at 12:05
  • If you always must `EnterWriteLock`, why does `EnterReadLock` even exist? Can it not the read lock be used for methods like `Contains`? – ErikE Nov 20 '15 at 22:01
  • 2
    This is not a ConcurrentHashSet but rather a ThreadSafeHashSet. See my comment on @ZenLulz answer regarding the self implementation. I am 99% sure that anyone who used those implementations will have a serious bug in their application. – George Mavritsakis Apr 11 '16 at 09:41
1

I found that neither simply locking add and remove methods of a System.Collections.Generic.HashSet, nor wrapping the framework's ConcurrentDictionary is sufficient in "high-throughput" scenarios which require good performance.

Fairly good performance can already be achieved by using this simple idea:

public class ExampleHashSet<T>
{
    const int ConcurrencyLevel = 124;
    const int Lower31BitMask = 0x7FFFFFFF;
    
    HashSet<T>[] sets = new HashSet<T>[ConcurrencyLevel];
    IEqualityComparer<T> comparer;

    public ExampleHashSet()
    {
        comparer = EqualityComparer<T>.Default;

        for(int i = 0; i < ConcurrencyLevel; i++)
            sets[i] = new HashSet<T>();
    }

    public bool Add(T item)
    {
        int hash = (comparer.GetHashCode(item) & Lower31BitMask) % ConcurrencyLevel;
        
        lock(sets[hash]) 
        {
            return sets[hash].Add(item);
        }
    }

    public bool Remove(T item)
    {
        int hash = (comparer.GetHashCode(item) & Lower31BitMask) % ConcurrencyLevel;
        
        lock(sets[hash]) 
        {
            return sets[hash].Remove(item);
        }
    }

    // further methods ...
}

The system's HashSet is wrapped, but in contrast to other anwers, we hold locks on multiple HashSets. Different threads are able to "work" on different HashSets, lowering the overall wait-time.

This idea can be generalized and implemented directly in the HashSet itself (holding locks on the buckets, instead of locking complete sets). An example can be found here.

notgiven
  • 29
  • 3
  • This is a good idea, provided that you don't need to support enumeration. If you want enumeration, it gets very inefficient and messy. – Theodor Zoulias Nov 12 '22 at 15:29
0

For the most common tasks, the following ConcurrentDictionary derived class should suffice. To get a fully fledged hash set, you can still use a ConcurrentDictionary as a backing field, implement all the HashSet interfaces and pass the calls to the dictionary.

Simple implementation

public class ConcurrentHashSet<T> : ConcurrentDictionary<T, byte>
    where T : notnull
{
    const byte DummyByte = byte.MinValue;

    // For convenience, we add HashSet equivalent APIs here...
    public bool Contains(T item) => ContainsKey(item);
    public bool Add(T item) => TryAdd(item, DummyByte);
    public bool Remove(T item) => TryRemove(item, out _);
}

Limitations

  1. No support for null values.
  2. Will behave as a dictionary. E.g. Enumerating, serializing...
  3. HashSet specific interfaces are missing.
l33t
  • 18,692
  • 16
  • 103
  • 180