Locking HashSet for concurrency

Question

When using a HashSet<string> to check, whether an item was handled before (i.e. only Add and Contains is used). Furthermore it is not relevant, when Contains returns false, even though it was added before ...

I encountered following exception without locking:

[IndexOutOfRangeException: Index was outside the bounds of the array.] System.Collections.Generic.HashSet`1.AddIfNotPresent(T value) +6108128

Is it sufficient, to lock only the Add call?

Following seems to work forever - but that is not a proof...

HashSet<string> hashSet = new HashSet<string>();
Parallel.ForEach(GetString(), h => 
{
    hashSet.Contains(h);
    lock(hashSetLock) 
    {
        hashSet.Add(h); 
    }
    hashSet.Contains(h);
});

To make it precise: I know that it is not thread-safe to call Contains without a lock. My question is (accepting false positives) if the above code could throw an exception or could destroy the internal state of the underlying data structure (=HashSet).

Is this your actual code? Because it doesn't make much sense. — Yuval Itzchakov, Oct 13 '15 at 07:40
This isn't a very good example. There are a lot of ways to produce this hashset in a thread-safe way. If you really want to modify the collection from multiple threads, you can use a ConcurrentDictionary that uses the same values as key and value. — Panagiotis Kanavos, Oct 13 '15 at 07:58

Lasse V. Karlsen · Accepted Answer · 2015-10-13T10:20:39.477

8

No, it is not sufficient to lock only on Add.

The fact that it doesn't crash only tells you that it didn't crash during your test.

You cannot guarantee that:

It won't crash in the future
It will produce the correct results

A non-threadsafe data structure has no guarantees whatsoever if used in a multithreaded fashion.

You need to either:

Lock on every call to it
Use a threadsafe data structure, one that has been built to support this scenario

If you use a different data structure than a hashset, like a dictionary, you may even need to lock multi-statements, because this may still fail:

lock (dLock)
    if (d.ContainsKey("test"))
        return;

var value = ExpensiveCallToObtainValue();
lock (dLock)
    d.Add("test", value);

Between the call to ContainsKey and the call to Add another thread may have already inserted that key.

To handle this correctly, without using a threadsafe data structure, is contain both operations inside the same lock:

lock (dLock)
{
    if (!d.ContainsKey("test"))
        d.Add("test", ExpensiveCallToObtainValue());
}

edited Oct 13 '15 at 10:20

answered Oct 13 '15 at 07:43

Lasse V. Karlsen

380,855
102
628
825

1

Thanks for your answer. I knew, that my test is not a guarantee / proof (as I already stated in the question :)). To narrow it down, my question is, whether Contains could throw an exception ... (false positives are not a problem for me as well...) – Markus Oct 13 '15 at 07:57
3

The answer is that you have no guarantee that it won't. – Lasse V. Karlsen Oct 13 '15 at 08:07
Please add the code to do it correctly, it's easy to take the wrong code snippet you provided for the correct way of doing it. – George Polevoy Oct 13 '15 at 08:15
1

In my case no Remove on HashSet is being made, but Add is being used. I made some heavy tests and did not catch a single exception on the Contains call. Also I examined the Contains code from Reflector and I could not find something that would cause an Exception if no Remove is being made. However I didn't feel comfortable when I read this answer and surround Contains with a try catch that uses goto. Yes the code looked ugly but my measurements show that it is faster than using a lock, because no exception ever occured. – Koray Sep 22 '17 at 08:10
1

I can't say that it **will** crash, but the class in question isn't thread safe. The exact behavior isn't documented precisely because it isn't thread safe. For all I know it will burn down your house or chase away your dog. – Lasse V. Karlsen Sep 22 '17 at 08:14
@Lasse V. Karlsen I do agree with that. I just wanted to mention that using a try-catch for Contains performed better than using a lock. – Koray Sep 22 '17 at 08:17
Yes, but in the case where you do get an exception you have effectively killed that execution branch, it didn't successfully execute contains so no matter what the result would be you no longer try to add something. The net result is that the code behaves differently. – Lasse V. Karlsen Sep 22 '17 at 10:47
if you do only Add operations, Contains is bening which means if it returns true it will remain true, if it returns false, you already double check after you take lock and it wont cause any hazards. it wont return positive values if it doesnt contain the value. no need to be prejudiced here. – TakeMeAsAGuest May 13 '18 at 20:34

Yuval Itzchakov · Answer 2 · 2015-10-13T07:55:12.793

No, as others stated, it isn't thread-safe to do what you're doing. If the underlying collection isn't thread-safe, you will need to lock on every operation.

When using a HashSet<T>, there need not be a ContainsKey check, as Add will check if the internal collection already contains the value or not:

Return Value Type: System.Boolean

true if the element is added to the HashSet object; false if the element is already present.

So you can narrow your code to:

private readonly object syncRoot = new object();
lock (syncRoot)
    hashSet.Add(value);

score 1 · Answer 3 · answered Oct 13 '15 at 07:43

1

What is the point of those calls to Contains()? They don't do anything. If you want to add only if the set does not contain an item, then you can do the following:

if(!hasSet.Contains(h))
{
   lock(hashSetLock)
   {
      if(!hasSet.Contains(h))
      {
         hashSet.Add(h);
      }
   }
}

With this code you don't lock to check the existing of the element, but if the element was not set you have to check again after locking. What do you gain? You don't lock in case the element already exists.

answered Oct 13 '15 at 07:43

Marius Bancila

16,053
9
49
91

`HashSet.Contains` is not guaranteed to be thread-safe, so you cannot really call it without the lock. – Joey Oct 13 '15 at 07:49
doing the lock is expensive, Add already does the contains, so no need to do it in the lock, but by doing it ounce outside the lock, you can avoid the expensive lock if the value already is in the hashset. Unless someone can explain in what way a contains could fail (as in throw or cause corruption) if not locked, than it is a good compromise for code which needs high performance (at least sometimes) – NiKiZe Dec 22 '19 at 10:27

Locking HashSet for concurrency

3 Answers3

Linked

Related