I'm currently implementing a thread-safe dictionary in C# which uses immutable AVL trees as buckets internally. The idea is to provide fast read access without a lock because in my application context, we add entries to this dictionary only at startup and afterwards, values are mostly read (but there still are a few number of writes).
I've structured my TryGetValue
and GetOrAdd
methods in the following way:
public sealed class FastReadThreadSafeDictionary<TKey, TValue> where TKey : IEquatable<TKey>
{
private readonly object _bucketContainerLock = new object();
private ImmutableBucketContainer<TKey, TValue> _bucketContainer;
public bool TryGetValue(TKey key, out TValue value)
{
var bucketContainer = _bucketContainer;
return bucketContainer.TryFind(key.GetHashCode(), key, out value);
}
public bool GetOrAdd(TKey key, Func<TValue> createValue, out TValue value)
{
createValue.MustNotBeNull(nameof(createValue));
var hashCode = key.GetHashCode();
lock (_bucketContainerLock)
{
ImmutableBucketContainer<TKey, TValue> newBucketContainer;
if (_bucketContainer.GetOrAdd(hashCode, key, createValue, out value, out newBucketContainer) == false)
return false;
_bucketContainer = newBucketContainer;
return true;
}
}
// Other members omitted for sake of brevity
}
As you can see, I don't use a lock in TryGetValue
because reference assignment in .NET runtimes is an atomic operation by design. By copying the reference of the field _bucketContainer
to a local variable, I'm sure I can safely access the instance because it is immutable. In GetOrAdd
, I use a lock to access the private _bucketContainer
so I can ensure that a value is not created twice (i.e. if two or more threads are trying to add a value, only one can actually create a new ImmutableBucketContainer
with the added value because of the lock).
I use Microsoft Chess for testing concurrency and in one of my tests, MCUT (Microsoft Concurrency Unit Testing) reports a data race in GetOrAdd
when I exchange the new bucket container with the old one:
[DataRaceTestMethod]
public void ReadWhileAdd()
{
var testTarget = new FastReadThreadSafeDictionary<int, object>();
var writeThread = new Thread(() =>
{
for (var i = 5; i < 10; i++)
{
testTarget.GetOrAdd(i, () => new object());
Thread.Sleep(0);
}
});
var readThread = new Thread(() =>
{
object value;
testTarget.TryGetValue(5, out value);
Thread.Sleep(0);
testTarget.TryGetValue(7, out value);
Thread.Sleep(10);
testTarget.TryGetValue(9, out value);
});
readThread.Start();
writeThread.Start();
readThread.Join();
writeThread.Join();
}
MCUT reports the following message:
23> Test result: DataRace 23> ReadWhileAdd() (Context=, TestType=MChess): [DataRace]Found data race at GetOrAdd:FastReadThreadSafeDictionary.cs(68)
which is the assignment _bucketContainer = newBucketContainer;
in GetOrAdd
.
My actual question is: why is the assignment _bucketContainer = newBucketContainer
a race condition? Threads currently executing TryGetValue
always make a copy of the _bucketContainer
field and thus shouldn't be bothered with the update (except that the searched value might be added to the _bucketContainer
just after the copy takes place, but this doesn't matter with the data race). And in GetOrAdd
, there is an explicit lock to prevent concurrent access. Is this a bug in Chess or am I missing something very obvious?