Multiple Parallel.ForEach calls, MemoryBarrier?

Question

I have a bunch of data rows, and I want to use Parallel.ForEach to compute some value on each row like this...

class DataRow
{
    public double A { get; internal set; }
    public double B { get; internal set; }
    public double C { get; internal set; }

    public DataRow()
    {
        A = double.NaN;
        B = double.NaN;
        C = double.NaN;
    }
}

class Program
{
    static void ParallelForEachToyExample()
    {
        var rnd = new Random();
        var df = new List<DataRow>();

        for (int i = 0; i < 10000000; i++)
        {
            var dr = new DataRow {A = rnd.NextDouble()};
            df.Add(dr);
        }

        // Ever Needed? (I)
        //Thread.MemoryBarrier();

        // Parallel For Each (II)
        Parallel.ForEach(df, dr =>
        {
            dr.B = 2.0*dr.A;
        });

        // Ever Needed? (III)
        //Thread.MemoryBarrier();

        // Parallel For Each 2 (IV)
        Parallel.ForEach(df, dr =>
        {
            dr.C = 2.0 * dr.B;
        });
    }
}

(In this example, there's no need to parallelize and if there was, it could all go inside one Parallel.ForEach. But this is meant to be a simplified version of some code where it makes sense to set it up like this).

Is it possible for the reads to be re-ordered here so that I end up with a data row where B != 2A or C != 2B?

Say the first Parallel.ForEach (II) assigns worker thread 42 to work on data row 0. And the second Parallel.ForEach (IV) assigns worker thread 43 to work on data row 0 (as soon as the first Parallel.ForEach finishes). Is there a chance that the read of dr.B for row 0 on thread 43 returns double.NaN since it hasn't seen the write from thread 42 yet?

And if so, does inserting a memory barrier at III help at all? Would this force the updates from the first Parallel.ForEach to be visible to all threads before the second Parallel.ForEach starts?

In short.. I don't think you need the explicit memory barriers.. An educated guess would be that implementation of Parallel.ForEach has some kind of synchronization for ending the loop / before call to `ForEach` returns — Vikas Gupta, May 15 '15 at 03:36
Given a better picture of your actual code, I might be able to give you a better answer other than "No, don't worry about it." :) — jdphenix, May 15 '15 at 04:13
Maybe the reason for the separation is a little clearer if I say that the calculation in each row of the second parallel loop (IV) depends on some value that can only be known after the first loop (II) finishes. Say we need the median of the values of dr.B across all rows before we can compute the value of dr.C for each row. — Michael Covelli, May 15 '15 at 12:35

score 5 · Accepted Answer · edited May 23 '17 at 10:26

5

The work started by a Parallel.ForEach() will be done before it returns. Internally, ForEach() spawns a Task for each iteration, and calls Wait() on each one. As a result, you do not need to synchronize access between ForEach() calls.

You do need to keep that in mind for individual tasks with ForEach() overloads that allow you access to loop state, aggregating results from tasks, etc. For example in this trivial example which sums up 1 ≤ x ≤ 100, the Action passed to localFinally of Parallel.For() has to be concerned about synchronization issues,

var total = 0;

Parallel.For(0, 101, () => 0,  // <-- localInit
(i, state, localTotal) => { // <-- body
  localTotal += i;
  return localTotal;
}, localTotal => { <-- localFinally
  Interlocked.Add(ref total, localTotal); // Note the use of an `Interlocked` static method
});

// Work of previous `For()` call is guaranteed to be done here

Console.WriteLine(total);

In your example, it is not necessary to insert a memory barrier between the ForEach() calls. Specifically, loop IV can depend on the results of II being completed, and Parallel.ForEach() already inserted III for you.

Snippet sourced from: Parallel Framework and avoiding false sharing

edited May 23 '17 at 10:26

Community

1
1

answered May 15 '15 at 03:37

jdphenix

15,022
3
41
74

Thanks. When I look through the code of Parallel.ForEach a few levels down, it looks like "private static ParallelLoopResult ForWorker" is doing most of the work. It's a little hard for me to follow, but it looks like there's a call to "rootTask.Wait();" that waits for all the worker threads to finish before proceeding. But even though my main thread is waiting for the workers to finish, that doesn't guarantee that worker threads spread out among all the other processors will necessarily see the most recent writes when they go to read in values, does it? – Michael Covelli May 15 '15 at 12:39
That's correct, and I'll edit my answer to perhaps be a bit more clear. Tasks spawned by the *same* `ForEach()` call will need to be aware of concurrency issues - the usual spot to be concerned about is in the `Action` you pass to `localFinally`. However, different calls of `ForEach()` can safely depend upon the results of prior `ForEach()` calls safely. – jdphenix May 15 '15 at 18:47
I guess my question is related to this... http://stackoverflow.com/questions/6581848/memory-barrier-generators. I just want to make sure that the end of Parallel.ForEach falls into one of those buckets. So that it has its own MemoryBarrier (effectively) and guarantees that everything is fully written before the next Parallel.ForEach starts. – Michael Covelli May 15 '15 at 19:09
Related yes - that's more of a "what generates a memory barrier" and yours is a "do I need one between `Parallel.ForEach()` calls". – jdphenix May 15 '15 at 19:13
I guess if this is correct... http://www.albahari.com/threading/part4.aspx then "Anything that relies on signaling, such as starting or waiting on a Task" implicitly generates a full fence. And the last thing in the body of the function "private static ParallelLoopResult ForWorker" that does most of the work in Parallel.ForEach is a call to "rootTask.Wait();" (before catch and finally blocks). So it seems like this call generates the same full fence as the MemoryBarrier. So it's not needed. – Michael Covelli May 15 '15 at 19:17
http://www.albahari.com/threading/part5.aspx#_The_Parallel_Class *may be* a more interesting read given the topic. – jdphenix May 15 '15 at 19:21
But what about the MemoryBarrier at I. What is it about the start of Parallel.ForEach that causes a memory barrier to already be there? – Michael Covelli May 15 '15 at 19:51

score -1 · Answer 2 · answered May 15 '15 at 03:54

-1

Since more than one thread will be accessing the same variable "dr.B", you will need to make sure your C# code is thread-safe.

Try using "lock" round each operation https://msdn.microsoft.com/en-us/library/c5kehkcz.aspx

e.g.

private Object thisLock1 = new Object();
...
lock(thisLock1)
{
    dr.C = 2.0 * dr.B;
}

...
lock(thisLock1)
{
    dr.B = 2.0*dr.A;
}

However, doing this will defeat the parallel processing. since each thread has to wait until the next one is done.

Make sure to read the potential pitfall with parallel processing: https://msdn.microsoft.com/en-us/library/dd997403%28v=vs.110%29.aspx

answered May 15 '15 at 03:54

Carl Prothman

1,461
13
23

In the OP's specific example using `Parallel.ForEach()`, each `ForEach()` call already handles synchronization, specifically ensuring any parallel operations spawned by the call are completed before it returns. – jdphenix May 15 '15 at 04:03
@jdphenix - can you provide a reference please (for my education)? Note Microsoft MSDN shows: How to: Write a Parallel.ForEach Loop That Has Thread-Local Variables https://msdn.microsoft.com/en-us/library/dd460703%28v=vs.110%29.aspx that uses (finalResult) => Interlocked.Add(ref total, finalResult) – Carl Prothman May 15 '15 at 04:46
It is correct to state that an individual `ForEach()` does need to consider thread safety, and as such `ForEach()` provides overloads that allow specification of a thread local and finalizer as you've linked. As far as the `Wait()` calls internal to `ForEach()`, I had to look over the reference source to confirm that. – jdphenix May 15 '15 at 04:52
I could lock on each data row. I agree that 99% of the time that's the right thing to do to avoid reasoning about low lock code. But here, I do have a *huge* number of data rows that lend themselves well to processing in parallel. Adding the locking slows my real-world code considerably. And I really don't *need* to guarantee mutual exclusion here. Each worker thread will only operate on one data row at a time by design. What I really need to do is make sure that if thead A works on row i in the first loop and thread B in the second that B can see all of A's writes before proceeding. – Michael Covelli May 15 '15 at 12:48

Multiple Parallel.ForEach calls, MemoryBarrier?

2 Answers2