Data coherency on multi-threaded environment without concurrent data access

Question

Does memory barrier ensure data coherence across threads when there is no locking and no concurrent data access except from the parent thread ?

Here is my scenario :

the Main thread launch several child threads
the childs threads wait for the state to be ready
the Main thread starts working on a global set of data, and notify the childs threads went the sate is ready. Then the Main thread wait for the child to process.
each child thread works on a different subset of the global set of data (either the objects instance are totally different or the threads work on different final member/part of the object memory) ==> each memory address is read/write by at most ONE child thread
when a child thread finish is core work, it notify the Main thread. The child thread keeps running on specific data of its own.
when all childs finish their core work, the Main thread resume its work with the global set of data modified by the child threads

I would like to ensure that :

i. each child thread sees a fresh/clean state when doing their core work (that is to say that all the writes (from the main thread ) performed before starting the child thread are seen by the child thread)
ii. upon resuming, the Main thread sees a fresh/clean state (that is to say that all the writes (from the child threads) performed by the child thread during their core work are seen by the Main thread)

Here how the code is:

    public class State
    {
        public string Data1 { get; set; }
        public string Data2 { get; set; }

        public int[] SomeArray { get; set; }
    }

    static void Main(string[] args)
    {
        var finishedCounter = 0L;
        var ready = 0L;
        var state = new State();

        var thread1 = new Thread(o =>
        {
            while (Interlocked.Read(ref ready) != 1)
            {
                //waiting for initialization
            }
            var input = (State)o;
            //Assert.AreEqual("Initial Data1 From MainThread", input.Data1);
            Thread.MemoryBarrier();//TMB 1.1 Force read value pushed from mainThread

            //Core work
            Assert.AreEqual("Initial Data1 From MainThread", input.Data1);
            input.Data1 = "Modified by Thread 1";
            input.SomeArray[1] = 11;

            Thread.MemoryBarrier();//TMB 1.2 Force storing all written value in order to be visible for the Main threads

            Interlocked.Increment(ref finishedCounter);
            while (true)
            {
                //Non-core work (using data specific to this thread)
            }
        });
        var thread2 = new Thread(o =>
        {
            while (Interlocked.Read(ref ready) != 1)
            {
                //waiting for initialization
            }
            var input = (State)o;
            //Assert.AreEqual("Initial Data2 From MainThread", input.Data2);
            Thread.MemoryBarrier();//TMB 2.1 Force read value pushed from mainThread

            //Core work
            Assert.AreEqual("Initial Data2 From MainThread", input.Data2);
            input.Data2 = "Modified by Thread 2";
            input.SomeArray[2] = 22;

            Thread.MemoryBarrier();//TMB 2.2 Force storing all written value in order to be visible for the Main threads
            Interlocked.Increment(ref finishedCounter);
            while (true)
            {
                //Non-core work (using data specific to this thread)
            }
        });

        thread1.Start(state);
        thread2.Start(state);

        state.Data1 = "Initial Data1 From MainThread";
        state.Data2 = "Initial Data2 From MainThread";
        state.SomeArray = new[] { 0, -1, -2 };
        Thread.MemoryBarrier();//TMB 0.1 Force storing all written value in order to be visible for the child threads

        Interlocked.Increment(ref ready);//child thread will process

        while (Interlocked.Read(ref finishedCounter) != 2)//let's wait for the childs threads to finish their core work
        {
        }

        //Assert.AreEqual("Modified by Thread 1", state.Data1);
        //Assert.AreEqual("Modified by Thread 2", state.Data2);

        Thread.MemoryBarrier();//TMB 0.1 Force retrieving all written value from the child threads

        Assert.AreEqual("Modified by Thread 1", state.Data1);
        Assert.AreEqual("Modified by Thread 2", state.Data2);
        Assert.AreEqual(0, state.SomeArray[0]);
        Assert.AreEqual(11, state.SomeArray[1]);
        Assert.AreEqual(22, state.SomeArray[2]);

        Console.WriteLine("Done");
        Console.ReadLine();
    }

The questions are :

Are the asserts on the code above always true ?
Can the commented asserts on the code above be false?
Does the the 6 Thread.MemoryBarrier ensure the point i. and ii. above?
Are the comments on the TMB relevant ?
Is it architecture dependant ? (I am working on both x86/x64 cpus)

This doesn't answer your actual questions, but to a first approximation, if you have to ask, you shouldn't be using such techniques at all in favor of `lock` and pre-cooked collections like `ConcurrentDictionary`, because even the experts often have a hard time agreeing and actually figuring out the correct answers in the light of the complicated semantics, different CPU architectures (I notice you mention x86 but not ARM), and runtime differences. For multithreading it really pays off to first get code that's obviously correct, and further optimize only if a proven bottleneck occurs. — Jeroen Mostert, Jun 17 '20 at 14:37
[Codereview](https://codereview.stackexchange.com/) would be a better place to ask such questions. An even better idea would be to seek help at [architecting](https://softwareengineering.stackexchange.com/) first (before you have written a single line of code). — Sinatr, Jun 17 '20 at 14:39
Other things to look into: `Parallel.For[Each]`, Parallel LINQ (`collection.AsParallel()`), [TPL Dataflow](https://learn.microsoft.com/dotnet/standard/parallel-programming/dataflow-task-parallel-library) and [SIMD types](https://learn.microsoft.com/dotnet/standard/simd), depending on your scenario. All of these offer safer and occasionally even better performing ways of processing data in parallel, as opposed to trying to do very fine-grained lock-free access (where you can easily lose any gains you're trying to get; memory barriers aren't free either). — Jeroen Mostert, Jun 17 '20 at 14:45
Hardware memory models are cache coherent; you don't need explicit flushing to see data across threads. You just need barriers for ordering. Doing `thread1.Start(state);` *after* setting `state` in the main thread should be sufficient ordering. Also exiting the thread + having the main thread `join` should again be sufficient ordering, like release/acquire sync. — Peter Cordes, Jun 17 '20 at 15:06

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

3

(editor's note: this is answering the original version of the question, where thread exit / thread.Join provided synchronization, instead of Interlocked operations. I believe the answer is broadly the same for the new version)

From are arrays threadsafe:

I believe that if each thread only works on a separate part of the array, all will be well. If you're going to share data (i. e. communicate it between threads) then you'll need some sort of memory barrier to avoid memory model issues.

...

Jon Skeet

From this I believe the answers would be :

Yes
Probably not. I believe that thread.Join would have to issue at least a memory barrier. But I cannot find that this is documented.
I do not think that Memory barriers are necessary to ensure consistent behavior in this case, since no concurrent threads reads or writes the same data.
No, not if the memory barriers are unnecessary.
I'm fairly confident that x86 and x64 has the same behavior. If the memory model had been changed it would have caused all kinds of issues when porting old code. ARM uses a weaker memory model, but as long as you adhere to the .Net memory model it should be fine.

In general I would advocate a slightly paranoid approach, code might change, and concurrency issues can be difficult to detect. Some rules of thumb:

Prefer immutability when possible
Use existing constructs (parallel.For, ConcurrentCollections, task.Run) when possible
Use Interlocked when reading or writing shared fields.
use lock if other methods are not applicable, but keep the critical section as small as possible, preferably running a bounded amount of code.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 17 '20 at 15:08

JonasH

28,608
2
10
23

I changed the code in order not to rely on the Start()/Join() methods from the thread because 1/ the jobs can actually be done by the ThreadPool or 2/ we can reuse a given thread for additional core work on a different part of the dataset – Toto Jun 17 '20 at 15:46
1

@Toto Have you changed your question to make the majority of JonasH's answer pointless? – Peilonrayz Jun 17 '20 at 16:34
@Peilonrayz: yes, they have, but it was a silly fairly trivial question before that. – Peter Cordes Jun 17 '20 at 16:50
@PeterCordes Wow, I've heard legends of these chameleon questions. But I never thought I'd witness such toxicity without even a crumb of appreciation. – Peilonrayz Jun 17 '20 at 17:32
@Peilonrayz: I didn't mean to defend the OP here; it's very poor behaviour on their part. The only saving grace is that the answers to both cases are basically the same (if I've read their code correctly): manual barriers weren't needed. I've upvoted this answer since everything it says is correct. – Peter Cordes Jun 17 '20 at 17:54
@Peilonrayz: as I explained in the comment, I changed the code in order not to rely on an "hidden" MemoryBarrier. Looks like the new code just rely on an other "hidden" MemoryBarrier. My global question is : does a memorybarrier is needed (either manually called or implictly) ? And the fact that an implicit call "fix" it does not really interest me. As per the code change, since I needed to change it, what is the correct behavior in that case to adopt, should I close this question and open an other with a slight code change ? Add some disclaimer more visible than my comment somewhere ? other ? – Toto Jun 17 '20 at 18:03
@Toto: IMO, editing was the right approach, even though it partly invalidated an answer. If the original question had been more interesting and/or useful to keep around, then a new post would have been appropriate. But yes, the TL:DR is that many things you can do between threads create synchronization. And you don't always need a full barrier (draining the store buffer), usually just release / acquire, so you should definitely try to avoid stand-alone `Thread.MemoryBarrier()` whenever possible. – Peter Cordes Jun 18 '20 at 02:31

Peter Cordes · Answer 2 · 2020-06-17T17:56:26.370

(This is answering the updated version of the question, with worker threads and nasty spin-waits instead of exit / thread.Join)

Can the commented asserts on the code above be false?

No, your Thread.MemoryBarrier() all appear to be redundant and just slowing you down. (I didn't read your code super carefully, but I think you have threads waiting to see the result of an Interlocked operation from another thread before they read/write other data.)

Interlocked.Increment(ref ready); is a release/acquire operation (in fact a full memory barrier like Thread.MemoryBarrier()). The new value of ref won't be visible until all earlier stores are also visible.

Interlocked.Read is an acquire operation: later loads can only have been loaded after the value you get from it. https://preshing.com/20120913/acquire-and-release-semantics/

This gives you release/acquire synchronization without needing any explicit / stand-alone memory barriers. (Does Interlocked.CompareExchange use a memory barrier? cites the ECMA-335 standard, so we know this is a portable guarantee, not an x86 implementation detail.)

All hardware memory models that C++ / C# threads run across are cache coherent; you don't need explicit flushing to see data across threads. You just need to make sure that the compiler isn't keeping the value in a register and not storing or loading at all.

In the hypothetical case where explicit flushing was required, Interlocked would be doing that for you to preserve the language's memory model.

Data coherency on multi-threaded environment without concurrent data access

2 Answers2