Scott Chamberlain's answer covers nicely the scenario of frequent readers and infrequent writers, by using a ReaderWriterLockSlim
. But what if writing is as frequent as reading? The ReaderWriterLockSlim
won't help much in this case.
My idea for mitigating this scenario is to move out of the protected section the calculation of the hashcode, and protect only the operations that involve shared state. This should be beneficial in case the cost of calculating the hashcode of the values is not trivial, for example when the values are long strings. Below is an implementation of this idea, for building a bounded concurrent HashSet<T>
. This collection uses a HashSet<(T, int)>
as underlying storage, with the int
property being the precalculated hashcode of the T
value:
public class BoundedConcurrentHashSet<T>
{
private readonly HashSet<(T Value, int HashCode)> _set;
private readonly int _boundedCapacity;
private readonly IEqualityComparer<T> _comparer;
public BoundedConcurrentHashSet(int boundedCapacity,
IEqualityComparer<T> comparer = default)
{
_boundedCapacity = boundedCapacity;
_comparer = comparer ?? EqualityComparer<T>.Default;
_set = new(new _Comparer(_comparer));
}
// A comparer that returns the precalculated HashCode
private class _Comparer : IEqualityComparer<(T, int)>
{
private readonly IEqualityComparer<T> _comparer;
public _Comparer(IEqualityComparer<T> comparer) { _comparer = comparer; }
public bool Equals((T, int) x, (T, int) y) => _comparer.Equals(
x.Item1, y.Item1);
public int GetHashCode((T, int) obj) => obj.Item2;
}
public int Count { get { lock (_set) return _set.Count; } }
public bool IsEmpty => Count == 0;
public int BoundedCapacity => _boundedCapacity;
public bool Contains(T value)
{
int hashCode = _comparer.GetHashCode(value);
lock (_set) return _set.Contains((value, hashCode));
}
public bool TryGetValue(T equalValue, out T actualValue)
{
int hashCode = _comparer.GetHashCode(equalValue);
lock (_set)
{
if (_set.TryGetValue((equalValue, hashCode), out var existing))
{
actualValue = existing.Value; return true;
}
actualValue = default; return false;
}
}
public bool TryAdd(T value)
{
int hashCode = _comparer.GetHashCode(value);
lock (_set)
{
if (_set.Count < _boundedCapacity) return _set.Add((value, hashCode));
return false;
}
}
public bool TryGetOrAdd(T equalValue, out T actualValue)
{
int hashCode = _comparer.GetHashCode(equalValue);
lock (_set)
{
if (_set.TryGetValue((equalValue, hashCode), out var existing))
{
actualValue = existing.Value; return true;
}
if (_set.Count < _boundedCapacity)
{
bool added = _set.Add((equalValue, hashCode));
Debug.Assert(added);
actualValue = equalValue; return true;
}
actualValue = default; return false;
}
}
public bool TryRemove(T value)
{
int hashCode = _comparer.GetHashCode(value);
lock (_set) return _set.Remove((value, hashCode));
}
public bool TryRemove(T equalValue, out T actualValue)
{
int hashCode = _comparer.GetHashCode(equalValue);
lock (_set)
{
if (_set.TryGetValue((equalValue, hashCode), out var existing))
{
bool removed = _set.Remove((equalValue, hashCode));
Debug.Assert(removed);
actualValue = existing.Value; return true;
}
actualValue = default; return false;
}
}
public T[] ToArray()
{
lock (_set) return _set.Select(e => e.Value).ToArray();
}
}
The public members of this collection are:
- Properties:
Count
, IsEmpty
and BoundedCapacity
.
- Methods:
Contains
, TryGetValue
, TryAdd
, TryGetOrAdd
, TryRemove
and ToArray
.
Using internally a HashSet<T>
with a different T
has implications regarding the protection against HashDoS attacks. In case you plan to use this collection with string
keys of potentially hostile origin, check out this GitHub issue before proceeding.
Below are some benchmarks involving four different implementations, and with different lengths for the string values.
- Lock-Optimized: is the implementation above.
- Lock-Simple: is a simple
HashSet<T>
+lock
implementation that doesn't precalculate the hashcodes.
- ReaderWriterLockSlim: is Scott Chamberlain's implementation.
- Native: is an implementation that wraps a
ConcurrentDictionary<T, object>
. It's not bounded, so it's not a fair contender. It is included here just for reference.
The scenario is the same for all tests: 4 worker threads that randomly and concurrently read (50%) or add (25%) or remove (25%) values from the same hashset. The reported metric is the total number of operations by all workers per second.
String length |
Lock-Optimized |
Lock-Simple |
ReaderWriterLockSlim |
Native (not bounded) |
10 |
3,833,272 |
4,564,383 |
1,978,146 |
10,369,830 |
30 |
4,196,021 |
4,419,448 |
2,023,593 |
10,697,999 |
100 |
4,024,539 |
3,417,730 |
1,913,043 |
8,365,800 |
300 |
3,952,180 |
2,128,451 |
1,551,082 |
4,644,652 |
1000 |
1,994,425 |
1,018,591 |
839,897 |
2,110,027 |
The Lock-Simple outperforms the Lock-Optimized for short strings. The Lock-Optimized starts to shine for strings with length 100 and more.