10

While the answer to this question is excellent, it implies that you should surround calls to List.ToArray() in a lock for concurrency. this blog post also implies that it could fail catastrophically (but rarely). I typically use ToArray rather than a lock in when enumerating Lists or other collections in order to avoid the "Collection Modified, Enumeration may not complete" exception. This answer and the blog post have called that assumption into question.

The documentation for List.ToArray() does not list any exceptions, so I have always assumed that it will always complete (albeit maybe with stale data) and that while it is not thread safe from a data consistency point of view, it is thread safe from a code execution point of view - in other words, it won't throw an exception and calling it won't corrupt the internal data structure of the underlying collection.

If this assumption isn't correct, then while it has never caused a problem it could be a timebomb in a high availability application. What is the definitive answer?

Community
  • 1
  • 1
Joe Enzminger
  • 11,110
  • 3
  • 50
  • 75
  • List.Add also doesn't throw an exception if anther threads modifies the list at the same time. It still isn't thread-safe. It just checks that you're not modifying and enumerating it at the same time in the same thread. What makes you believe that a method that is not documented to be thread-safe could be thread-safe? (Assuming you're talking about List.ToArray or Enumerable.ToArray. ConcurrentBag.ToArray is thread-safe, as is enumerating a ConcurrentBag without ToArray.) – dtb Feb 18 '13 at 22:02
  • I narrowed the scope of the question to focus on List and List since these are the ones I am most concerned about. I have seen this technique used in a number of popular open source frameworks, so I don't think I'm the only one making the assumption about the "safety" of the technique. I'm not asking if they are thread safe ( by definition they are not), but should I launch into finding and fixing any instances of "unsafe" calls to ToArray()? – Joe Enzminger Feb 18 '13 at 22:15
  • Are you sure these open source frameworks use ToArray rather than a lock, or are they actually using ToArray to modify a list while they are enumerating it in the same thread? – dtb Feb 18 '13 at 22:22

5 Answers5

7

You will not find documentation about possible exceptions of ToArray method for one simple reason. This is an extension method that has many 'overloads'. They all have same method signature, but implementation is different for different collection types, e.g. List<T> and HashSet<T>.

However, we can make a safe assumption for most of the code that .NET framework BCL does not perform any locking for performance reasons. I've also checked very specifically implementation of ToList for List<T>.

public T[] ToArray()
{
    T[] array = new T[this._size];
    Array.Copy(this._items, 0, array, 0, this._size);
    return array;
}

As you might have imagined, it's quite simple code which ends up executing in mscorlib. For this specific implementation, you can also see exceptions which could occur in MSDN page for Array.Copy method. It boils down to an exception which is thrown if rank of the list changes right after destination array was just allocated.

Having in mind that List<T> is trivial example, you can imagine that chances for exception rise on structures which require more complicated code in order to store in an array. Implementation for Queue<T> is a candidate which is more likely to fail:

public T[] ToArray()
{
    T[] array = new T[this._size];
    if (this._size == 0)
    {
        return array;
    }
    if (this._head < this._tail)
    {
        Array.Copy(this._array, this._head, array, 0, this._size);
    }
    else
    {
        Array.Copy(this._array, this._head, array, 0, this._array.Length - this._head);
        Array.Copy(this._array, 0, array, this._array.Length - this._head, this._tail);
    }
    return array;
}
Nikola Radosavljević
  • 6,871
  • 32
  • 44
5

When thread-safety is not explicitly guaranteed by the docs or by principle you cannot assume it. If you do assume it you risk putting a class of bugs into production that is undebuggable and has the potential to cost you a lot of productivity/availability/money. Are you willing to take that risk?

You can never test something to be threadsafe. You can never be sure. You cannot be sure that a future version behaves the same way.

Do it the right way and lock.

Btw, these remarks were for List.ToArray which is one of the more safe versions of ToArray. I understand why one would mistakenly think it can be used concurrently with writes to the list. Of course IEnumerable.ToArray cannot possibly be threadssafe because that is a property of the underlying sequence.

usr
  • 168,620
  • 35
  • 240
  • 369
  • The -1 is because the answer is both too wordy and too generalized. Sometimes the right thing to do is lock, but sometimes the right thing to do is choose a thread or concurrent data structure/algorithm. Plus, there *are* tests you can perform on code that is supposed to be thread-safe in an effort to verify it. – Sam Harwell Feb 18 '13 at 22:54
  • @280Z28 I respect your criticism. The answer was meant to be quite general because the underlying issue is the same for a whole class of questions. I'm just suspicious of your last statement: what tests would you perform to assure that ToArray is 100% safe? Anything less is a bomb shipping into production. Did you ever do operations work? I *hate* getting 10 error mails per day from a non-deterministic bug. – usr Feb 18 '13 at 23:03
  • I'm actually quite sure you have a good grasp of the situation, but the particular presentation may prove misleading for other developers who are not as experienced. Your statement about testing is not restricted to testing `ToArray`, but instead gives the impression that "testing to be threadsafe" is not possible in general. – Sam Harwell Feb 19 '13 at 02:35
3

ToArray is NOT threadsafe, and this code proves it!

Consider this rather ridiculous code:

        List<int> l = new List<int>();

        for (int i = 1; i < 100; i++)
        {
            l.Add(i);
            l.Add(i * 2);
            l.Add(i * i);
        }

        Thread th = new Thread(new ThreadStart(() =>
        {
            int t=0;
            while (true)
            {
                //Thread.Sleep(200);

                switch (t)
                {
                    case 0:
                        l.Add(t);
                        t = 1;
                        break;
                    case 1:
                        l.RemoveAt(t);
                        t = 0;
                        break;
                }
            }
        }));

        th.Start();

        try
        {
            while (true)
            {
                Array ai = l.ToArray();

                //foreach (object o in ai)
                //{
                //    String str = o.ToString();
                //}
            }
        }
        catch (System.Exception ex)
        {
            String str = ex.ToString();                 
        }

    }

This code will fail in a very short run, because of the l.Add(t) line. Because the ToArray is NOT threadsafe, it will allocate the array to the current size of l, then we will add an element to l (in the other thread), and then it will try to copy the current size of l to ai and fail because l has too many elements. ToArray throws an ArgumentException.

David Hope
  • 2,216
  • 16
  • 32
1

First of all you will have to make it clear that the callsite must be in a thread-safe region. Most regions in your code will not be threadsafe regions and will assume a single thread of execution at any given time (for most application code). For (a very rough estimate) 99% of all application code this question makes no real sense.

Secondly you will have to make it clear "what" the enumeration function really is, as this will vary on the type of enumeration you are running through - are you talking about the normal linq extension to Enumerations?

Thirdly the link you provide to the ToArray code and the lock statement around it is nonsense at best: Without showing that the callsite also locks on the same collection, it does not guarantee thread safety at al.

And so on.

Casper Leon Nielsen
  • 2,528
  • 1
  • 28
  • 37
1

It seems you are confusing two things:

  • List<T> does not support being modified while it is enumerated. When enumerating a list, the enumerator checks if the list has been modified after each iteration. Calling List<T>.ToArray before enumerating the list solves this problem, as you're enumerating a snapshot of the list then and not the list itself.

  • List<T> is not a thread-safe collection. All of the above assumes access from the same thread. Accessing a list from two threads always requires a lock. List<T>.ToArray is not thread-safe and doesn't help here.

dtb
  • 213,145
  • 36
  • 401
  • 431