5

One day I was trying to get a better understanding of threading concepts, so I wrote a couple of test programs. One of them was:

using System;
using System.Threading.Tasks;
class Program
{
    static volatile int a = 0;

    static void Main(string[] args)
    {
        Task[] tasks = new Task[4];

        for (int h = 0; h < 20; h++)
        {
            a = 0;
            for (int i = 0; i < tasks.Length; i++)
            {
                tasks[i] = new Task(() => DoStuff());
                tasks[i].Start();
            }
            Task.WaitAll(tasks);
            Console.WriteLine(a);
        }
        Console.ReadKey();
    }

    static void DoStuff()
    {
        for (int i = 0; i < 500000; i++) 
        {
            a++;
        }
    }
}

I hoped I will be able to see outputs less than 2000000. The model in my imagination was the following: more threads read variable a at the same time, all local copies of a will be the same, the threads increment it and the writes happen and one or more increments are "lost" this way.

Although the output is against this reasoning. One sample output (from a corei5 machine):

2000000
1497903
1026329
2000000
1281604
1395634
1417712
1397300
1396031
1285850
1092027
1068205
1091915
1300493
1357077
1133384
1485279
1290272
1048169
704754

If my reasoning were true I would see 2000000 occasionally and sometimes numbers a bit less. But what I see is 2000000 occasionally and numbers way less than 2000000. This indicates that what happens behind the scenes is not just a couple of "increment losses" but something more is going on. Could somebody explain me the situation?

Edit: When I was writing this test program I was fully aware how I could make this thrad safe and I was expecting to see numbers less than 2000000. Let me explain why I was surprised by the output: First lets assume that the reasoning above is correct. Second assumption (this wery well can be the source of my confusion): if the conflicts happen (and they do) than these conflicts are random and I expect a somewhat normal distribution for these random event occurences. In this case the first line of the output says: from 500000 experiments the random event never occured. The second line says: the random event occured at least 167365 times. The difference between 0 and 167365 is just to big (almost impossible with a normal distribution). So the case boils down to the following: One of the two assumptions (the "increment loss" model or the "somewhat normally distributed paralell conflicts" model) are incorrect. Which one is and why?

Hari
  • 4,514
  • 2
  • 31
  • 33
  • Keep in mind that Tasks are not threads, they are a higher-level abstraction. See http://stackoverflow.com/questions/4130194/what-is-the-difference-between-task-and-thread. – fsimonazzi Nov 13 '12 at 13:04
  • And you should really use `Task.Factory.StartNew(DoStuff)` inside the first inner loop :p – Louis Kottmann Nov 13 '12 at 13:05
  • @Baboon If you want to get *really* pedantic, then you would use `Task.Run` which was introduced in .NET 4.5. – casperOne Nov 13 '12 at 13:07
  • When i run your program i see consistently numbers between approx. 1.1 and 1.8 million. It would be very unlikely, that you can do two million increments without getting a race condition... Can you reproduce your numbers? – Jan Nov 13 '12 at 13:07
  • @Jan The numbers are impossible to reproduce between two singular runs, let alone runs on different machines; it all matters on when the processor gets a hold of the variable, which is indeterminate every time (depending on what else is going on). The point is, [values are being overwritten because of the increment operator (`++`)](http://stackoverflow.com/a/13361165/50776). – casperOne Nov 13 '12 at 13:09
  • @Jan Yes I can approximately on every 3rd run I get 2 million one or more times – Hari Nov 13 '12 at 13:09
  • @casperOne it's not about being pedantic, it's about making things more readable by using built-in stuff instead of reinventing the wheel. – Louis Kottmann Nov 13 '12 at 13:09
  • @Baboon I don't disagree, I was just having a bit of fun. I use your recommendation in .NET 4.0 and `Task.Run` in .NET 4.5 by default. There's little reason to use the constructor of a `Task` at all. – casperOne Nov 13 '12 at 13:13
  • @casperOne alright then ;) I didn't know about Task.Run, but then again I've never touched .NET 4.5 – Louis Kottmann Nov 13 '12 at 13:13
  • @Baboon Give it a try, you'll love it, and the [`async`/`await`](http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx) keywords make *everything* around `Task` instances *so* sweet. – casperOne Nov 13 '12 at 13:14
  • @casperOne yes i know, but the OP was expecting the race condition and i was just wondering about how whatever machine gets done 2 million increments without hitting the race condition – Jan Nov 13 '12 at 13:17
  • @Hari What kind of machine are you using? Amount of physical and OS-level cores? – flindeberg Nov 13 '12 at 13:23

3 Answers3

8

The behavior stems from the fact that you are using both the volatile keyword as well as not locking access to the variable a when using the increment operator (++) (although you still get a random distribution when not using volatile, using volatile does change the nature of the distribution, which is explored below).

When using the increment operator, it's the equivalent of:

a = a + 1;

In this case, you're actually doing three operations, not one:

  1. Read the value of a
  2. Add 1 to the value of a
  3. Assign the result of 2 back to a

While the volatile keyword serializes access, in the above case, it's serializing access to three separate operations, not serializing access to them collectively, as an atomic unit of work.

Because you're performing three operations when incrementing instead of one, you have additions that are being dropped.

Consider this:

Time    Thread 1                 Thread 2
----    --------                 --------
   0    read a (1)               read a (1)
   1    evaluate a + 1 (2)       evaluate a + 1 (2)
   2    write result to a (3)    write result to a (3)

Or even this:

Time    a    Thread 1               Thread 2           Thread 3
----    -    --------               --------           --------
   0    1    read a                                    read a
   1    1    evaluate a + 1 (2)
   2    2    write back to a
   3    2                           read a
   4    2                           evaluate a + 1 (3)
   5    3                           write back to a
   6    3                                              evaluate a + 1 (2)
   7    2                                              write back to a

Note in particular steps 5-7, thread 2 has written a value back to a, but because thread 3 has an old, stale value, it actually overwrites the results that previous threads have written, essentially wiping out any trace of those increments.

As you can see, as you add more threads, you have a greater potential to mix up the order in which the operations are being performed.

volatile will prevent you from corrupting the value of a due to two writes happening at the same time, or a corrupt read of a due to a write happening during a read, but it doesn't do anything to handle making the operations atomic in this case (since you're performing three operations).

In this case, volatile ensures that the distribution of the value of a is between 0 and 2,000,000 (four threads * 500,000 iterations per thread) because of this serialization of access to a. Without volatile, you run the risk of a being anything as you can run into corruption of the value a when reads and/or writes happen at the same time.

Because you haven't synchronized access to a for the entire increment operation, the results are unpredictable, as you have writes that are being overwritten (as seen in the previous example).

What's going on in your case?

For your specific case you have many writes that are being overwritten, not just a few; since you have four threads each writing a loop two million times, theoretically all the writes could be overwritten (expand the second example to four threads and then just add a few million rows to increment the loops).

While it's not really probable, there shouldn't be an expectation that you wouldn't drop a tremendous amount of writes.

Additionally, Task is an abstraction. In reality (assuming you are using the default scheduler), it uses the ThreadPool class to get threads to process you requests. The ThreadPool is ultimately shared with other operations (some internal to the CLR, even in this case) and even then, it does things like work-stealing, using the current thread for operations and ultimately at some point drops down to the operating system at some level to get a thread to perform work on.

Because of this, you can't assume that there's a random distribution of overwrites that will be skipped, as there's always going to be a lot more going on that will throw whatever order you expect out the window; the order of processing is undefined, the allocation of work will never be evenly distributed.

If you want to ensure that additions won't be overwritten, then you should use the Interlocked.Increment method in the DoStuff method, like so:

for (int i = 0; i < 500000; i++)
{
    Interlocked.Increment(ref a);
}

This will ensure that all writes will take place, and your output will be 2000000 twenty times (as per your loop).

It also invalidates the need for the volatile keyword, as you're making the operations you need atomic.

The volatile keyword is good when the operation that you need to make atomic is limited to a single read or write.

If you have to do anything more than a read or a write, then the volatile keyword is too granular, you need a more coarse locking mechanism.

In this case, it's Interlocked.Increment, but if you have more that you have to do, then the lock statement will more than likely be what you rely on.

casperOne
  • 73,706
  • 19
  • 184
  • 253
  • Unfortunately ref parameters don't retain the volatile capabilities, making his test bogus. – Louis Kottmann Nov 13 '12 at 13:08
  • 1
    @Baboon The need for volatile goes away when you add a proper lock, I'm adding it now, thanks. – casperOne Nov 13 '12 at 13:09
  • 1
    The point of the question was not about thread safety. It was rather about what happens in a specific thread unsafe case. Please see my edit – Hari Nov 13 '12 at 13:25
  • @Hari See the "What's going on in your case" section which addresses your specific case. While it's not about thread safety, you need to understand that you have more going on here than you think (namely, the behavior of the increment operator and how much `volatile` serializes access too, and how they can give you unexpected results when combined). Everything else is about *why* this is the case, as well as how to avoid it (if you wish). – casperOne Nov 13 '12 at 13:30
  • @Hari Updated with more to your specific point about why a random distribution won't be seen; there are just too many things going on when you get a `Task` to ensure an even allocation of work (and by extension, an even distribution of overwrites). – casperOne Nov 13 '12 at 13:35
  • @Hari Do you want more information on how you could ensure an even distribution of work while parallelizing (which still doesn't ensure that you won't get extremes on one end of the range)? You can do it using custom partitioning, but it might be overkill. If you want it though, I'll whip it up. – casperOne Nov 13 '12 at 13:52
  • @casperOne That would be interesting however it is outside the scope of this question. If you ever write a blog post about that don't forget to inform me. – Hari Nov 13 '12 at 14:11
  • @casperOne - Your answer seems to start by saying the problem stems from using volatile and not locking. That doesn't seem correct since the problem still exists with or without the use of volatile. – Rob P. Nov 13 '12 at 14:27
  • @RobP. Sorry, I'll update, but the point is, `volatile` doesn't guarantee the consistency that one might expect when performing multiple operations; without `volatile` you still get random results, but potentially are unbounded *outside* of two million, because you could have corruption of reads and writes (which you won't get with `volatile`). I'll update the answer. – casperOne Nov 13 '12 at 14:35
  • @RobP. Updated the middle paragraphs to reflect this. – casperOne Nov 13 '12 at 14:41
0

I don't think it's anything else happening - it's just happening a lot. If you add 'locking' or some other synch technique (Best thread-safe way to increment an integer up to 65535) you'll reliably get the full 2,000,000 increments.

Each task is calling DoStuff() as you'd expect.

private static object locker = new object();

static void DoStuff()
{
    for (int i = 0; i < 500000; i++)
    {
        lock (locker)
        {
            a++;
        }
    }
}
Community
  • 1
  • 1
Rob P.
  • 14,921
  • 14
  • 73
  • 109
  • "It's just happening *a lot*" is rather vague and not really a good explanation of what's going on, and using a `lock` even on values up to 65535 is overkill when you have `Interlocked.Increment`; the CLR will upsize everything to a 32 bit integer *anyways* when performing operations, so you might as well use an `int`. – casperOne Nov 13 '12 at 13:06
  • OP's question indicates he's aware of threading issues when incrementing a value from multiple threads. He said he was expecting 'less loss' and asked if 'something else' was happening. He understands that it's not atomic. lock is just an example, the link I included also demonstrates Interlocked.Increment. – Rob P. Nov 13 '12 at 14:19
0

Try increasing the the amounts, the timespan is simply to short to draw any conclusions on. Remember that normal IO is in the range of milliseconds and just one blocking IO-op in this case would render the results useless.

Something along the lines of this is better: (or why not intmax?)

     static void DoStuff()
     {
        for (int i = 0; i < 50000000; i++) // 50 000 000
           a++;
     }

My results ("correct" being 400 000 000):

63838940
60811151
70716761
62101690
61798372
64849158
68786233
67849788
69044365
68621685
86184950
77382352
74374061
58356697
70683366
71841576
62955710
70824563
63564392
71135381

Not really a normal distribution but we are getting there. Bear in mind that this is roughly 35% of the correct amount.

I can explain my results as I am running on 2 physical cores, although viewed as 4 due to hyperthreading, which means that if it is optimal to do a "ht-switch" during the actual addition atleast 50% of the additions will be "removed" (if I remember the implementation of ht correctly it would be (ie modifying some threads data in ALU while loading/saving other threads data)). And the remaining 15% due to the program actually running on 2 cores in parallell.

My recommendations

  • post your hardware
  • increase the loop count
  • vary the TaskCount
  • hardware matters!
flindeberg
  • 4,887
  • 1
  • 24
  • 37
  • You can't *ever* depend on a normal distribution, given the fact that `Task` will employ work stealing and the like. You're *never* going to be able to rely on an even distribution of work, unless you handle the partitioning yourself. – casperOne Nov 13 '12 at 13:46
  • @casperOne Never is quite a powerful word, and work stealing actually has nothing to do with the resulting machine instructions, and theoretically the distribution of values will be normal over a greater timespan. My bet is still on hardware, not software, for the greatest part of the deviation. – flindeberg Nov 13 '12 at 13:58
  • 1
    Your answer inspired me to do the same test in c++ The results are totally different from the .Net Task Parallel Lib version included in my question. The c++ output looks like what I expected to see in the beginning (resembles to a somewhat normal distribution). So @casperOne is right and the main factor in the situation was some higher level .Net abstraction. – Hari Nov 13 '12 at 14:56