Concurrency issue: parallel writes

Question

One day I was trying to get a better understanding of threading concepts, so I wrote a couple of test programs. One of them was:

using System;
using System.Threading.Tasks;
class Program
{
    static volatile int a = 0;

    static void Main(string[] args)
    {
        Task[] tasks = new Task[4];

        for (int h = 0; h < 20; h++)
        {
            a = 0;
            for (int i = 0; i < tasks.Length; i++)
            {
                tasks[i] = new Task(() => DoStuff());
                tasks[i].Start();
            }
            Task.WaitAll(tasks);
            Console.WriteLine(a);
        }
        Console.ReadKey();
    }

    static void DoStuff()
    {
        for (int i = 0; i < 500000; i++) 
        {
            a++;
        }
    }
}

I hoped I will be able to see outputs less than 2000000. The model in my imagination was the following: more threads read variable a at the same time, all local copies of a will be the same, the threads increment it and the writes happen and one or more increments are "lost" this way.

Although the output is against this reasoning. One sample output (from a corei5 machine):

If my reasoning were true I would see 2000000 occasionally and sometimes numbers a bit less. But what I see is 2000000 occasionally and numbers way less than 2000000. This indicates that what happens behind the scenes is not just a couple of "increment losses" but something more is going on. Could somebody explain me the situation?

Edit: When I was writing this test program I was fully aware how I could make this thrad safe and I was expecting to see numbers less than 2000000. Let me explain why I was surprised by the output: First lets assume that the reasoning above is correct. Second assumption (this wery well can be the source of my confusion): if the conflicts happen (and they do) than these conflicts are random and I expect a somewhat normal distribution for these random event occurences. In this case the first line of the output says: from 500000 experiments the random event never occured. The second line says: the random event occured at least 167365 times. The difference between 0 and 167365 is just to big (almost impossible with a normal distribution). So the case boils down to the following: One of the two assumptions (the "increment loss" model or the "somewhat normally distributed paralell conflicts" model) are incorrect. Which one is and why?

Keep in mind that Tasks are not threads, they are a higher-level abstraction. See http://stackoverflow.com/questions/4130194/what-is-the-difference-between-task-and-thread. — fsimonazzi, Nov 13 '12 at 13:04
And you should really use `Task.Factory.StartNew(DoStuff)` inside the first inner loop :p — Louis Kottmann, Nov 13 '12 at 13:05
@Baboon If you want to get *really* pedantic, then you would use `Task.Run` which was introduced in .NET 4.5. — casperOne, Nov 13 '12 at 13:07
When i run your program i see consistently numbers between approx. 1.1 and 1.8 million. It would be very unlikely, that you can do two million increments without getting a race condition... Can you reproduce your numbers? — Jan, Nov 13 '12 at 13:07
@Jan The numbers are impossible to reproduce between two singular runs, let alone runs on different machines; it all matters on when the processor gets a hold of the variable, which is indeterminate every time (depending on what else is going on). The point is, [values are being overwritten because of the increment operator (`++`)](http://stackoverflow.com/a/13361165/50776). — casperOne, Nov 13 '12 at 13:09
@Jan Yes I can approximately on every 3rd run I get 2 million one or more times — Hari, Nov 13 '12 at 13:09
@casperOne it's not about being pedantic, it's about making things more readable by using built-in stuff instead of reinventing the wheel. — Louis Kottmann, Nov 13 '12 at 13:09
@Baboon I don't disagree, I was just having a bit of fun. I use your recommendation in .NET 4.0 and `Task.Run` in .NET 4.5 by default. There's little reason to use the constructor of a `Task` at all. — casperOne, Nov 13 '12 at 13:13
@casperOne alright then ;) I didn't know about Task.Run, but then again I've never touched .NET 4.5 — Louis Kottmann, Nov 13 '12 at 13:13
@Baboon Give it a try, you'll love it, and the [`async`/`await`](http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx) keywords make *everything* around `Task` instances *so* sweet. — casperOne, Nov 13 '12 at 13:14
@casperOne yes i know, but the OP was expecting the race condition and i was just wondering about how whatever machine gets done 2 million increments without hitting the race condition — Jan, Nov 13 '12 at 13:17
@Hari What kind of machine are you using? Amount of physical and OS-level cores? — flindeberg, Nov 13 '12 at 13:23

casperOne · Accepted Answer · 2012-11-13T18:44:05.783

The behavior stems from the fact that you are using both the volatile keyword as well as not locking access to the variable a when using the increment operator (++) (although you still get a random distribution when not using volatile, using volatile does change the nature of the distribution, which is explored below).

When using the increment operator, it's the equivalent of:

a = a + 1;

In this case, you're actually doing three operations, not one:

Read the value of a
Add 1 to the value of a
Assign the result of 2 back to a

While the volatile keyword serializes access, in the above case, it's serializing access to three separate operations, not serializing access to them collectively, as an atomic unit of work.

Because you're performing three operations when incrementing instead of one, you have additions that are being dropped.

Consider this:

Time    Thread 1                 Thread 2
----    --------                 --------
   0    read a (1)               read a (1)
   1    evaluate a + 1 (2)       evaluate a + 1 (2)
   2    write result to a (3)    write result to a (3)

Or even this:

Time    a    Thread 1               Thread 2           Thread 3
----    -    --------               --------           --------
   0    1    read a                                    read a
   1    1    evaluate a + 1 (2)
   2    2    write back to a
   3    2                           read a
   4    2                           evaluate a + 1 (3)
   5    3                           write back to a
   6    3                                              evaluate a + 1 (2)
   7    2                                              write back to a

Note in particular steps 5-7, thread 2 has written a value back to a, but because thread 3 has an old, stale value, it actually overwrites the results that previous threads have written, essentially wiping out any trace of those increments.

As you can see, as you add more threads, you have a greater potential to mix up the order in which the operations are being performed.

volatile will prevent you from corrupting the value of a due to two writes happening at the same time, or a corrupt read of a due to a write happening during a read, but it doesn't do anything to handle making the operations atomic in this case (since you're performing three operations).

In this case, volatile ensures that the distribution of the value of a is between 0 and 2,000,000 (four threads * 500,000 iterations per thread) because of this serialization of access to a. Without volatile, you run the risk of a being anything as you can run into corruption of the value a when reads and/or writes happen at the same time.

Because you haven't synchronized access to a for the entire increment operation, the results are unpredictable, as you have writes that are being overwritten (as seen in the previous example).

What's going on in your case?

For your specific case you have many writes that are being overwritten, not just a few; since you have four threads each writing a loop two million times, theoretically all the writes could be overwritten (expand the second example to four threads and then just add a few million rows to increment the loops).

While it's not really probable, there shouldn't be an expectation that you wouldn't drop a tremendous amount of writes.

Additionally, Task is an abstraction. In reality (assuming you are using the default scheduler), it uses the ThreadPool class to get threads to process you requests. The ThreadPool is ultimately shared with other operations (some internal to the CLR, even in this case) and even then, it does things like work-stealing, using the current thread for operations and ultimately at some point drops down to the operating system at some level to get a thread to perform work on.

Because of this, you can't assume that there's a random distribution of overwrites that will be skipped, as there's always going to be a lot more going on that will throw whatever order you expect out the window; the order of processing is undefined, the allocation of work will never be evenly distributed.

If you want to ensure that additions won't be overwritten, then you should use the Interlocked.Increment method in the DoStuff method, like so:

for (int i = 0; i < 500000; i++)
{
    Interlocked.Increment(ref a);
}

This will ensure that all writes will take place, and your output will be 2000000 twenty times (as per your loop).

It also invalidates the need for the volatile keyword, as you're making the operations you need atomic.

The volatile keyword is good when the operation that you need to make atomic is limited to a single read or write.

If you have to do anything more than a read or a write, then the volatile keyword is too granular, you need a more coarse locking mechanism.

In this case, it's Interlocked.Increment, but if you have more that you have to do, then the lock statement will more than likely be what you rely on.

Unfortunately ref parameters don't retain the volatile capabilities, making his test bogus. — Louis Kottmann, Nov 13 '12 at 13:08
@Baboon The need for volatile goes away when you add a proper lock, I'm adding it now, thanks. — casperOne, Nov 13 '12 at 13:09
The point of the question was not about thread safety. It was rather about what happens in a specific thread unsafe case. Please see my edit — Hari, Nov 13 '12 at 13:25
@Hari See the "What's going on in your case" section which addresses your specific case. While it's not about thread safety, you need to understand that you have more going on here than you think (namely, the behavior of the increment operator and how much `volatile` serializes access too, and how they can give you unexpected results when combined). Everything else is about *why* this is the case, as well as how to avoid it (if you wish). — casperOne, Nov 13 '12 at 13:30
@Hari Updated with more to your specific point about why a random distribution won't be seen; there are just too many things going on when you get a `Task` to ensure an even allocation of work (and by extension, an even distribution of overwrites). — casperOne, Nov 13 '12 at 13:35
@Hari Do you want more information on how you could ensure an even distribution of work while parallelizing (which still doesn't ensure that you won't get extremes on one end of the range)? You can do it using custom partitioning, but it might be overkill. If you want it though, I'll whip it up. — casperOne, Nov 13 '12 at 13:52
@casperOne That would be interesting however it is outside the scope of this question. If you ever write a blog post about that don't forget to inform me. — Hari, Nov 13 '12 at 14:11
@casperOne - Your answer seems to start by saying the problem stems from using volatile and not locking. That doesn't seem correct since the problem still exists with or without the use of volatile. — Rob P., Nov 13 '12 at 14:27
@RobP. Sorry, I'll update, but the point is, `volatile` doesn't guarantee the consistency that one might expect when performing multiple operations; without `volatile` you still get random results, but potentially are unbounded *outside* of two million, because you could have corruption of reads and writes (which you won't get with `volatile`). I'll update the answer. — casperOne, Nov 13 '12 at 14:35

score 0 · Answer 2 · edited May 23 '17 at 10:28

0

I don't think it's anything else happening - it's just happening a lot. If you add 'locking' or some other synch technique (Best thread-safe way to increment an integer up to 65535) you'll reliably get the full 2,000,000 increments.

Each task is calling DoStuff() as you'd expect.

private static object locker = new object();

static void DoStuff()
{
    for (int i = 0; i < 500000; i++)
    {
        lock (locker)
        {
            a++;
        }
    }
}

edited May 23 '17 at 10:28

Community

1
1

answered Nov 13 '12 at 12:59

Rob P.

14,921
14
73
109

"It's just happening *a lot*" is rather vague and not really a good explanation of what's going on, and using a `lock` even on values up to 65535 is overkill when you have `Interlocked.Increment`; the CLR will upsize everything to a 32 bit integer *anyways* when performing operations, so you might as well use an `int`. – casperOne Nov 13 '12 at 13:06
OP's question indicates he's aware of threading issues when incrementing a value from multiple threads. He said he was expecting 'less loss' and asked if 'something else' was happening. He understands that it's not atomic. lock is just an example, the link I included also demonstrates Interlocked.Increment. – Rob P. Nov 13 '12 at 14:19

flindeberg · Answer 3 · 2012-11-13T13:42:49.983

Try increasing the the amounts, the timespan is simply to short to draw any conclusions on. Remember that normal IO is in the range of milliseconds and just one blocking IO-op in this case would render the results useless.

Something along the lines of this is better: (or why not intmax?)

     static void DoStuff()
     {
        for (int i = 0; i < 50000000; i++) // 50 000 000
           a++;
     }

My results ("correct" being 400 000 000):

Not really a normal distribution but we are getting there. Bear in mind that this is roughly 35% of the correct amount.

I can explain my results as I am running on 2 physical cores, although viewed as 4 due to hyperthreading, which means that if it is optimal to do a "ht-switch" during the actual addition atleast 50% of the additions will be "removed" (if I remember the implementation of ht correctly it would be (ie modifying some threads data in ALU while loading/saving other threads data)). And the remaining 15% due to the program actually running on 2 cores in parallell.

My recommendations

post your hardware
increase the loop count
vary the TaskCount
hardware matters!

You can't *ever* depend on a normal distribution, given the fact that `Task` will employ work stealing and the like. You're *never* going to be able to rely on an even distribution of work, unless you handle the partitioning yourself. — casperOne, Nov 13 '12 at 13:46
@casperOne Never is quite a powerful word, and work stealing actually has nothing to do with the resulting machine instructions, and theoretically the distribution of values will be normal over a greater timespan. My bet is still on hardware, not software, for the greatest part of the deviation. — flindeberg, Nov 13 '12 at 13:58
Your answer inspired me to do the same test in c++ The results are totally different from the .Net Task Parallel Lib version included in my question. The c++ output looks like what I expected to see in the beginning (resembles to a somewhat normal distribution). So @casperOne is right and the main factor in the situation was some higher level .Net abstraction. — Hari, Nov 13 '12 at 14:56

Concurrency issue: parallel writes

3 Answers3