3

Dealing with a Thread safe object - for example : Cache Object

Msdn states :

Thread Safety : This type is thread safe.

I do realize its meaning regarding setting by key within it in a multiThreaded environment.

But what about getting by key in a multiThreaded environment ?

If 200 threads wants to read a value , does 199 other threads are blocked / waiting ?

Thanks for any kind of help.

Royi Namir
  • 144,742
  • 138
  • 468
  • 792

5 Answers5

3

There are different ways of providing thread-safety, and which is used is not specified, but it can happen without needing to lock on reads. Short answer, reads don't need locks because of the way it's structured. Long answer:

The promise made in saying "all methods are thread-safe", is that if those methods are being called simultaneously, they will not corrupt the object or return an incorrect result. Let's consider the object List<char>.

Say we have an empty List<char> called list and three threads call the following simultaneously:

  1. char c = list[0].
  2. list.Add('a');
  3. list.Add('b');

There are four possible correct results:

  1. c contains 'a'. list contains {'a', 'b'}. list.Count would now return 2.
  2. c contains 'b'. list contains {'b', 'a'}. list.Count would now return 2.
  3. An ArgumentOutOfRangeException is raised on the first thread. list contains {'a', 'b'}. list.Count would now return 2.
  4. An ArgumentOutOfRangeException is raised on the first thread. list contains {'b', 'a'}. list.Count would now return 2.

All of these are correct, because a list being a list, we have to have either one inserted item before the other. We also have to have whichever ends up being the first being the value returned by list[0]. We also have to have thread 1's read treated as happening after this insertion, or before, so we either have that first element put into c or we have an ArgumentOutOfRangeException thrown (not a flaw with the thread-safety, looking at the first item of an empty list is out of range, regardless of threading concerns.

Since however List<char> is not thread-safe, the following can happen:

  1. One of the above possibilities (we got lucky).
  2. list contains {'a'}. list.Count returns 1. c contains 'b'
  3. list contains {'b'}. list.Count returns 1. c contains 'a'
  4. list contains {'a'}. list.Count returns 2. c contains 'b'
  5. list contains {'b'}. list.Count returns 2. c contains 'a'
  6. One or both of the insert calls, and/or the reading call throws an IndexOutOfRangeException. (Note, not the ArgumentOutOfRangeException that is acceptable behaviour for the first thread to raise).
  7. An assertion failure.
  8. Any of the above (including the correct behaviour), and then another operation on list performed half an hour later raises some exception that List<char> is not documented as raising for that method.

In other words, we completely messed up the whole thing to the point where list is in a state that the programmer(s) didn't consider it could possibly be in, and all bets are off.

Okay, so how do we build thread-safe collections.

A simple way is to lock on every method that either changes state, or depends upon state (which tends to be all of them). For static methods we lock on a static field, for instance methods we lock on an instance field (we could lock on an static field, but operations on separate objects would block each other). We make sure that if a public method calls another public method, it only locks once (perhaps by having public methods each calling into a private worker method that doesn't lock). It's possible some analysis would show that we can split all operations into two or more groups that are independent of each other, and have separate locks for each group, but unlikely.

The plus is simplicity, the minus is it locks a lot, perhaps blocking lots of mutually-safe operations from each other.

Another is a shared-exclusive lock (as .NET provides through the misnamed ReaderWriterLock and ReaderWriterLockSlim - misnamed because there are other scenarios apart from single-writer mutliple-reader, including some that are actually the exact opposite). It tends to be true that read-only operations (as seen from the inside - a read-only operation that uses memoisation isn't really a read-only operation, because it might update the internal cache) can safely run simultaneously, but write operations must be the sole operation happening, and a shared-exclusive lock lets us ensure that.

Another is through striped locking. This is how the current implementation of ConcurrentDictionary works. Several locks are created, and each write operation is assigned a lock. While other writes can happen at the same time, those that would hurt each other will be attempting to get the same lock. Some operations may require obtaining all locks.

Another is through lock-free synchronisation (sometimes called "low-lock" because there are features of the primitives used that are similar to locks in some ways but not in others). https://stackoverflow.com/a/3871198/400547 shows a simple lock-free queue, and ConcurrentQueue uses different lock-free techniques again (in .NET, last time I looked Mono's was close to that example).

While lock-free approaches may automatically sound like the best way to go, it isn't always the case. My own lock-free dictionary at https://github.com/hackcraft/Ariadne/blob/master/Collections/ThreadSafeDictionary.cs does better than ConcurrentDictionary's striped locking in some cases, but not as well in many others. Why? Well, for a start ConcurrentDictionary has almost certainly had more hours put into tuning it than I've been able to give mine yet! Seriously though, making sure a lock-free approach is safe requires logic that costs cycles and memory just the same as any other code. Sometimes it outweighs the cost of a lock, and sometimes it doesn't. (Edit: currently, I now beat ConcurrentDictionary in a wide range of cases, but then the people behind ConcurrentDictioanry are good, so maybe the balance will return to their favour in more cases with their next update).

Finally though, there's the simple fact that some structures are thread-safe for some operations in and of themselves. int[] for example is thread-safe in so far as if several threads are reading and writing to and from myIntArray[23] then while there are a variety of possible results, none of them corrupt the structure and all the reads will see either the initial state, or one of the values that one of the threads were writing. You might though want to use memory barriers to make sure that the reader threads weren't still seeing the initial state long after the last writer was finished due to a stale CPU cache.

Now, what is true of int[] is also true of object[] (but note, is not true of an array of a value-type larger than that which the CPU its running on will handle atomically). Since int[] and object[] are used internally in some dictionary based classes - Hashtable and of particular interest here, Cache being examples - it is possible to construct them in such a way that the structures are naturally threadsafe for single writer threads and multiple reader threads (not for multiple writers, because every now and then a writer has to resize the internal store and that's harder to make threadsafe). This is the case for Hashtable (IIRC it's also the case for Dictionary<TKey, TValue> for some types being TKey and TValue, but the behaviour is not guaranteed, and most certainly not true for some other types for TKey and TValue).

Once you've a structure that is natural thread-safe for multiple readers/single writer then making it thread-safe for multiple readers/multiple writers is a matter of locking on the writes.

End result is, you've got all the benefits of thread-safety, but the costs don't affect readers at all.

Community
  • 1
  • 1
Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
1

The thread safety does not apply for reading. Cache is thread safe for removing / inserting records. Multiple threads will be able to read a value at the same time. In case the value gets removed while threads are reading it, the ones that get in after the remove will just get null value out of the cache (a cache miss).

lahsrah
  • 9,013
  • 5
  • 37
  • 67
  • That therefore means that the thread safety does apply for reading. A thread getting a cache-miss in this case is no different than if the cache-missed was caused by the item being removed on the same thread, or having never been added. – Jon Hanna Aug 11 '12 at 18:27
1

Thread safe doesn't imply any particular strategy by which the thread safety is achieved (e.g. locking). It merely says that if multiple threads are interacting with the object then it will continue to produce its contracted behaviour - where the contract is the documented behaviour of each public or protected member.

If there's no further documentation, then its an implementation detail on how this thread safety is achieved, and may be subject to change.

It could, currently, take an exclusive lock on every access. Or it may use lock-free techniques. It's not documented either way, but I'd be highly surprised if there was an exclusive lock taken during read operations.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • _I'd be highly surprised if there was an exclusive lock_....What about if there's a _shared_ lock ? How does it work if 200 threads comes to read ? – Royi Namir Aug 10 '12 at 15:53
0

As far as I understand, other threads are blocked while one is writing to the object.
It doesn't make any sense to block other threads when one reads the object - because read operation doesn't change object state.

alex.b
  • 4,547
  • 1
  • 31
  • 52
0

For concurrent writing .NET 4 has a ConcurrentQueue collection which works very well. I have use it many times and have never had any issues yet. As for reading i have used List collection for parallele processing and do return correct values. See below a very basic example of how i have use it.

List<string> collection = new List<string>(); // Empty list in this example
ConcurrentQueue<string> concurrent = new ConcurrentQueue<string>();

Parallel.ForEach(collection, new ParallelOptions {MaxDegreeOfParallelism = Environment.ProcessorCount}, item =>
{
    concurrent.Enqueue(item);
});

To answer your question though, the collection would not lock other threads when reading concurrently. If however you were writting to the queue (concurrently) there is a high chance that some items may be kicked out and not inserted. 'lock' can be used to avoid that but this however will block other threads temporarily.

Mo Patel
  • 2,321
  • 4
  • 22
  • 37
  • There is no such risk of items being "kicked out" of either a `ConcurrentQueue` or `Cache`. The former would be pointless in such a case, the latter could justifiably use such a strategy, but the surprises involved would need to be well documented and considered carefully by any use. – Jon Hanna Feb 16 '14 at 22:48