24

I wonder if Linq extension methods are atomic? Or do I need to lock any IEnumerable object used across threads, before any sort of iteration?

Does declaring the variable as volatile have any affect on this?

To sum up, which of the following is the best, thread safe, operation?

1- Without any locks:

IEnumerable<T> _objs = //...
var foo = _objs.FirstOrDefault(t => // some condition

2- Including lock statements:

IEnumerable<T> _objs = //...
lock(_objs)
{
    var foo = _objs.FirstOrDefault(t => // some condition
}

3- Declaring variable as volatile:

volatile IEnumerable<T> _objs = //...
var foo = _objs.FirstOrDefault(t => // some condition
Kamyar Nazeri
  • 25,786
  • 15
  • 50
  • 87
  • They're not thread safe. See http://stackoverflow.com/questions/9995266/how-to-create-a-thread-safe-generic-list – stuartd Jun 19 '12 at 15:04

3 Answers3

26

The interface IEnumerable<T> is not thread safe. See the documentation on http://msdn.microsoft.com/en-us/library/s793z9y2.aspx, which states:

An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and its behavior is undefined.

The enumerator does not have exclusive access to the collection; therefore, enumerating through a collection is intrinsically not a thread-safe procedure. To guarantee thread safety during enumeration, you can lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.

Linq does not change any of this.

Locking can obviously be used to synchronize access to objects. You must lock the object everywhere you access it though, not just when iterating over it.

Declaring the collection as volatile will have no positive effect. It only results in a memory barrier before a read and after a write of the reference to the collection. It does not synchronize collection reading or writing.

Kris Vandermotten
  • 10,111
  • 38
  • 49
10

In short, they are not thread safe as mentioned above.

However, that does not mean you must lock before "every sort of iteration."

You need to synchronize all operations that change the collection (add, modify, or remove elements) with other operations that (add, modify, remove elements, or read elements).

If you are only performing read operations on the collection concurrently, no locking is needed. (so running LINQ commands like Average, Contains, ElementAtOrDefault all together would be fine)

If the elements in the collection are of machine word length, such as Int on most 32-bit computers, then changing the value of that element is already performed atomically. In this case don't add or remove elements from the collection without locking, but modifying values may be okay, if you can deal with some non-determinism in your design.

Lastly, you can consider fine-grained locking on individual elements or sections of the collection, rather than locking the entire collection.

Xantix
  • 3,321
  • 1
  • 14
  • 28
-1

Here's an example proving that the IEnumerable extension methods are not thread-safe. On my machine, the throw new Exception("BOOM"); line always hits within a few seconds.

Hopefully, I've documented the code well enough to explain how to trigger a threading issue.

You can run this code in linqpad to see for yourself.

async Task Main()
{
    // The theory is that it will take a longer time to query a lot of items
    // so there should be a better chance that we'll trigger the problem. 
    var listSize = 999999;
    
    // Specifies how many tasks to spin up. This doesn't necessarily mean
    // that it'll spin up the same number of threads, as we're using the thread
    // pool to manage that stuff. 
    var taskCount = 9999;

    // We need a list of things to query, but the example here is a bit contrived. 
    // I'm only calling it `ages` to have a somewhat meaningful variable name. 
    // This is a distinct list of ints, so, ideally, a filter like:
    // `ages.Where(p => p == 4` should only return one result. 
    // As we'll see below, that's not always the case. 
    var ages = Enumerable
        .Range(0, listSize)
        .ToList();
    
    // We'll use `rand` to find a random age in the list. 
    var rand = new Random();
    
    // We need a reference object to prove that `.Where(...)` below isn't thread safe. 
    // Each thread is going to modify this shared `person` property in parallel. 
    var person = new Person();
    
    // Start a bunch of tasks that we'll wait on later. This will run as parallel
    // as your machine will allow. 
    var tasks = Enumerable
        .Range(0, taskCount)
        .Select(p => Task.Run(() =>
        {
            // Pick a random age from the list. 
            var age = ages[rand.Next(0, listSize)];
            
            // These next two lines are where the problem exists. 
            // We've got multiple threads changing `person.Age` and querying on `person.Age` 
            // at the same time. As one thread is looping through the `ages` collection
            // looking for the `person.Age` value that we're setting here, some other
            // thread is going to modify `person.Age`. And every so often, that will
            // cause the `.Where(...)` clause to find multiple values. 
            person.Age = age;
            var count = ages.Where(a => a == person.Age).Count();

            // Throw an exception if the `.Where(...)` filter returned more than one age. 
            if (count > 1) {
                throw new Exception("BOOM");
            }
        }));
        
    await Task.WhenAll(tasks);
    
    Console.WriteLine("Done");
}

class Person {
    public int Age { get; set; }
}
Alex Dresko
  • 5,179
  • 3
  • 37
  • 57
  • 2
    This doesn't prove that the `IEnumerable` extension methods are not thread-safe. Your program would have the same behavior even if the `Where` operator was thread-safe. You don't mutate the collection (the `ages`), you mutate an independent object (the `person`). This is not a supported scenario even for concurrent collections, like the `ConcurrentDictionary`. The thread-safety guarantees of the concurrent collections are limited to the integrity of their internal state, not the state of the objects they contain, or the state of other independent objects. – Theodor Zoulias Jun 25 '21 at 15:51
  • @TheodorZoulias Do you have documentation to back this up? I'm not saying you're wrong... but, from my perspective, there's a thread safety issue here whether it's related to the underlying collection or not. What would you classify the problem in my code sample as? This proof of concept was the result of an unexpected bug we found in our code, and I wanted to post this for others to benefit from. – Alex Dresko Jun 25 '21 at 19:51
  • As an example you can check out the documentation of the [`ConcurrentDictionary.AddOrUpdate`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.addorupdate) method: *"If you call `AddOrUpdate` simultaneously on different threads, `addValueFactory` may be called multiple times."* So you could create a similar program which would prove that the `ConcurrentDictionary` is not thread-safe. But we know that it is, so the proof would be bogus. – Theodor Zoulias Jun 26 '21 at 01:09