38

I'm just looking in to the new .NET 4.0 features. With that, I'm attempting a simple calculation using Parallel.For and a normal for(x;x;x) loop.

However, I'm getting different results about 50% of the time.

long sum = 0;

Parallel.For(1, 10000, y =>
    {
        sum += y;
    }
);

Console.WriteLine(sum.ToString());

sum = 0;

for (int y = 1; y < 10000; y++)
{
   sum += y;
}
Console.WriteLine(sum.ToString());

My guess is that the threads are trying to update "sum" at the same time.
Is there an obvious way around it?

Alex Bagnolini
  • 21,990
  • 3
  • 41
  • 41
Inisheer
  • 20,376
  • 9
  • 50
  • 82
  • 4
    Concurrent programming comes in two parts, 1) executing on a separate thread, and 2) synchronizing\communicating across threads. parallel extensions enable 1), however 2) must be explicitly addressed by developer, and when you say `sum += y;` you are effectively having each thread say "add me to sum!" all at once. you need to synchronize their communication in regards to the shared resource `sum` – johnny g May 05 '10 at 14:45
  • 24
    my fears come true... parallel extensions let people write parallel code without understanding theory, including race conditions (as here) – Andrey May 05 '10 at 14:46
  • 29
    @Andrey - Yes, that's why some of us actually try to learn it first (ie. by posting questions on SO). – Inisheer May 05 '10 at 14:47
  • 1
    @TSS i would recommend to read some articles about multithreading and try to program it in lower level, using threads. then move to parallel extensions, because it is just a wrapper and it hides a lot of things from you. – Andrey May 05 '10 at 14:51
  • 1
    I don't know why Msft introduced parallel extensions in the first place... it is just going to encourage people to parallel stuff that shouldn't/doesn't need to be run in parallel. – DigitalZebra May 05 '10 at 14:54
  • 1
    @TSS @Andrey: I don't think you have to learn a lot of low level thread programming but you do have to understand the issues. The TPL does a lot to help with creating and managing threads by abstracting them as a Task type. It doesn't do much to help with shared data. You have to understand these issues even when using tasks. – Ade Miller May 05 '10 at 14:57
  • 3
    @Polaris878 - Same with LINQ, not sure why MS introduced that either. We have loops! /sarcasm – Inisheer May 05 '10 at 15:00
  • 2
    @TSS, I made no reference to LINQ... What I'm pointing out is that the overhead of something like this is much higher than running this in a single thread. People are going to start throwing Parallel.For anywhere in their code they have large loops... what I'm saying is that this is going to encourage people to needlessly introduce multithreading into their programs. – DigitalZebra May 05 '10 at 15:03
  • @Polaris878 Because there is a fundamental change in the hardware platform. I'd suggest looking at the Manycore Shift white paper and http://www.microsoft.com/downloads/details.aspx?FamilyID=633F9F08-AAD9-46C4-8CAE-B204472838E1&displaylang=en Herb Sutter's End of the free Lunch article http://www.gotw.ca/publications/concurrency-ddj.htm – Ade Miller May 05 '10 at 15:04
  • @Ade, right but how often are you going to write a freaking loop that is long enough to warrant multithreading?? Even with multiple cores, multithreading works best with IO bound or very heavily CPU bound tasks... and unsurprisingly, most code is NOT heavily IO or CPU bound. My point is that this is going to encourage n00bies to throw multithreading at every problem. – DigitalZebra May 05 '10 at 15:08
  • @Polaris, just pickin', and this was just a trivial example. @AdeMiller: exactly why I think this is important. I could get away with a couple simple threads in an application over the years, but now with 6 core processors, why not create faster and scalable applications? – Inisheer May 05 '10 at 15:08
  • 4
    @Polaris: We are moving towards a world where computers have hundreds of cores. For this world, it makes sense to make programs multithreaded. Besides, why are you hating on a technology that makes programming something difficult much easier? – BlueRaja - Danny Pflughoeft May 05 '10 at 15:20
  • 3
    @BlueRaja, my point wasn't that this feature isn't ALWAYS needless, certainly there will be some very useful cases for this. I'm simply saying that in MOST CASES this feature is completely useless and will actually SLOW your code down. It doesn't matter how many freaking cores you have, there is going to be overhead in making those cores communicate with eachother. So, it is ideal to make those cores only talk to eachother when they absolutely have to. – DigitalZebra May 05 '10 at 15:24
  • 2
    @BlueRaja parallel extensions only pretend to get much easier. – Andrey May 05 '10 at 15:29
  • @BlueRaja on GPUs hundreds of cores are today's reality. – Andrey May 05 '10 at 15:30
  • @TSS: I think the worry is if people write code without understanding it, and only asking questions when the result is different from what they expected. The problem with this methodology is that you might write code that appears to work fine when you test it on your single core dev machine on small amounts of data, but fails horribly when you deploy it to the multicore production server and large data sets. However +1 for the question. – Mark Byers Jul 31 '10 at 12:17

7 Answers7

71

You can't do this. sum is being shared across you parallel threads. You need to make sure that the sum variable is only being accessed by one thread at a time:

// DON'T DO THIS!
Parallel.For(0, data.Count, i =>
{
    Interlocked.Add(ref sum, data[i]);
});

BUT... This is an anti-pattern because you've effectively serialised the loop because each thread will lock on the Interlocked.Add.

What you need to do is add sub totals and merge them at the end like this:

Parallel.For<int>(0, result.Count, () => 0, (i, loop, subtotal) =>
    {
        subtotal += result[i];
        return subtotal;
    },
    (x) => Interlocked.Add(ref sum, x)
);

You can find further discussion of this on MSDN: http://msdn.microsoft.com/en-us/library/dd460703.aspx

PLUG: You can find more on this in Chapter 2 on A Guide to Parallel Programming

The following is also definitely worth a read...

Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4 - Stephen Toub

Alex Bagnolini
  • 21,990
  • 3
  • 41
  • 41
Ade Miller
  • 13,575
  • 1
  • 42
  • 75
18

sum += y; is actually sum = sum + y;. You are getting incorrect results because of the following race condition:

  1. Thread1 reads sum
  2. Thread2 reads sum
  3. Thread1 calculates sum+y1, and stores the result in sum
  4. Thread2 calculates sum+y2, and stores the result in sum

sum is now equal to sum+y2, instead of sum+y1+y2.

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
5

Your surmise is correct.

When you write sum += y, the runtime does the following:

  1. Read the field onto the stack
  2. Add y to the stack
  3. Write the result back to the field

If two threads read the field at the same time, the change made by the first thread will be overwritten by the second thread.

You need to use Interlocked.Add, which performs the addition as a single atomic operation.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • 5
    See below. The nieve way to use interlocked add simply serialized your loop. – Ade Miller May 05 '10 at 14:45
  • I would add that best way it so use local variable and after cycle add them to single global, using Interlocked.Add of course. – Andrey May 05 '10 at 14:46
  • 1
    There's example of this in my answer below. – Ade Miller May 05 '10 at 14:49
  • Joe Duffy's book has a good discussion on what actually happens under the hood on about page 21. http://www.bluebytesoftware.com/books/winconc/winconc_book_resources.html – Ade Miller May 05 '10 at 15:55
  • you loose the benefit of multi threading this if you take it back to an atomic operation. Ade Miller has a great answer to this question. – Jon May 06 '10 at 12:39
  • Also, don't talk about "reading the field onto the stack". That's almost definitely *not* happening (for code like this, it's almost guaranteed it will actually be in a register). Whether or not that happens is an implementation detail. While IL uses the stack for pretty much everything, the x86 code generated from IL most certainly doesn't :) – Luaan Apr 13 '15 at 11:28
  • @Luaan: I'm describing the operations on the IL level. – SLaks Apr 13 '15 at 14:16
4

Incrementing a long isn't an atomic operation.

Eric Mickelsen
  • 10,309
  • 2
  • 30
  • 41
4

I think it's important to distinguish that this loop is not capable of being partitioned for parallelism, because as has been mentioned above each iteration of the loop is dependent on the prior. The parallel for is designed for explicitly parallel tasks, such as pixel scaling etc. because each iteration of the loop cannot have data dependencies outside its iteration.

Parallel.For(0, input.length, x =>
{
    output[x] = input[x] * scalingFactor;
});

The above an example of code that allows for easy partitioning for parallelism. However a word of warning, parallelism comes with a cost, even the loop I used as an example above is far far too simple to bother with a parallel for because the set up time takes longer than the time saved via parallelism.

Mgetz
  • 5,108
  • 2
  • 33
  • 51
  • You can partition it for parallelism, you just have to think about it in a different way, aggregation. – Ade Miller May 16 '10 at 15:52
  • true... MPI_AllGather() would be a good example, however some cursory research on MSDN shows that you would have to turn to MPI# to get that functionality... as it doesn't seem to be included. You could however code your own. – Mgetz May 23 '10 at 17:52
3

An important point no-one seems to have mentioned: For data-parallel operations (such as the OP's), it is often better (in terms of both efficiency and simplicity) to use PLINQ instead of the Parallel class. The OP's code is actually trivial to parallelize:

long sum = Enumerable.Range(1, 10000).AsParallel().Sum();

The above snippet uses the ParallelEnumerable.Sum method, although one could also use Aggregate for more general scenarios. Refer to the Parallel Loops chapter for an explanation of these approaches.

Douglas
  • 53,759
  • 13
  • 140
  • 188
-1

if there are two parameters in this code. For example

long sum1 = 0;
long sum2 = 0;

Parallel.For(1, 10000, y =>
    {
        sum1 += y;
        sum2=sum1*y;
    }
);

what will we do ? i am guessing that have to use array !

Chris Shain
  • 50,833
  • 6
  • 93
  • 125
user347918
  • 161
  • 1
  • 2
  • 11