Understanding the Weak memory model

Question

Assume we have two threads, working with two variables A and B in memory:

Thread 1       Thread 2
========       ========
1) A = 1       3) B = 1
2) Print(B)    4) Print(A)

I know in a Sequential consistent (strong) model you would get 1 -> 2 -> 3-> 4 executed in order. x86 is TSO which is close to a Strong model (but not as strong as one).

I don't understand what the Week model is? Does a weak model just pick random instructions and execute? i.e things like 4 -> 2 -> 3 -> 1 would be possible?

I have 2 more questions regarding this topic:

What is the difference between Out-of-order execution done by a CPU to make use of instruction cycles that would otherwise be wasted, and memory reordering due to memory model are they the same thing? or memory reordering just deals with Load/Store instructions?
Is memory model a concern only when dealing with multiple threads? Why is it not an issue in single threaded programs?

Re, "...reordering _due to_ memory model..." I have never heard of any memory model that _requires_ instructions to be reordered. But many memory models _permit_ it. That's kind of the whole point of memory models: To be permissive enough so that it can be efficiently implemented on different hardware platforms, and to be described/specified in enough detail so that programmers can anticipate and deal with the consequences. — Solomon Slow, Aug 24 '20 at 21:19
Re, "I know in a Sequential consistent (strong) model you would get 1 -> 2 -> 3-> 4 executed in order." That sounds as if you are saying that strong sequential consistency requires thread 1 to run to completion before thread 2 can start. That doesn't sound right at all. I'm a little out of my area of expertise here, but in the absence of any synchronization, I don't see how it makes sense guarantee any temporal relationship between operations performed by different threads. The entire point of threads, is that they can be allowed to run independently of each other most of the time. — Solomon Slow, Aug 24 '20 at 21:22

mevets · Answer 1 · 2020-08-24T22:15:02.907

Sequential consistency does not tell you that it will execute 1,2,3,4 at all.

Sequential consistency tells you that if CPU0 is executing 1,2 and CPU1 is executing 3,4; that the CPUs will execute the blocks in that order, and no side effect (memory store) of 2 will be perceivable before those of 1; and no side effect of 4 will be perceivable before 3.

If earlier A=B=0, then:

Thread 1       Thread 2
========       ========
1) A = 1       3) B = 1
2) Print(A,B)  4) Print(A,B)

All sequential concurrency tells us is that the possible outputs are:

Thread 1 { 1, 0 }, { 1, 1}
Thread 2 { 0, 1 }, { 1, 1}.

If we extend it to an initial state of A=B=C=D=0

Thread 1       Thread 2
========       ========
A = 1          D = 1
C = 1          B = 1
Print(A,B,C,D) Print(A,B,C,D)

Thread1 valid outputs:

1: {1, 0, 1, 0}       -- no effects from thread2 seen
2: {1, 0, 1, 1}       -- update of D visible; not B
3: {1, 1, 1, 0}       -- update of B visible; not D
4: {1, 1, 1, 1}       -- update of B and D visible.

Thread2 valid outputs:

5: {0, 1, 0, 1}       -- no effects from thread1 seen
6: {0, 1, 1, 1}       -- update of C visible; not A
7: {1, 1, 0, 1}       -- update of A visible; not C
8: {1, 1, 1, 1}       -- update of A and C visible.

In sequential consistency, 1,2,4 : 5,6,8 are possible. In weaker consistencies, 1,2,3,4 : 5,6,7,8 are possible. Note that in neither case would the thread fail to see its own updates in order; but the outputs 3,7 result from the threads seeing the other threads updates out of order.

If you require a specific ordering to be maintained, inserting a barrier instruction[1] is the preferred approach. When the cpu encounters a barrier, it affects the either pre-fetched (read barrier), store queue (write barrier) or both (rw barrier).

When there are two memory writes: A = 1; C = 1; you can install write barriers as membar w; store A; store C. This ensures that all stores before the store to A will be seen before either the store to A or C; but enforces no ordering between A and C.

You can install them as store A; membar w; store C which ensure that the store of A will be seen before C; and store A; store C; membar w ensures that A and C will be seen before any subsequent stores.

So which barrier or barrier combination is right for your case?

[1] more modern architectures incorporate barriers into the load and store instructions themselves; so you might have a store.sc A; store C;. The advantage here is to limit the scope of the store barrier so that the store unit only has to serialize these stores, rather than suffer the latency of the entire queue.

Thanks for the explanation. So how is a Memory fence used here to make sure `3` doesn't happen in a weakly modelled machine? — Dan, Aug 24 '20 at 21:47
Also, is "Out of Order execution" the same as "memory reordering"? is memory reordering a subset of OO execution as it's only done on Load/Store instructions? — Dan, Aug 24 '20 at 22:16
@Dan: See http://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ for more about barriers. No, memory reordering is separate from out-of-order *execution*. Related: [Does an x86 CPU reorder instructions?](https://stackoverflow.com/q/50307693) explains how you can get memory reordering without OoO exec. A store buffer will create StoreLoad reordering even on a simple in-order pipeline even without L1d cache doing hit-under-miss or whatever. Also [How is load->store reordering possible with in-order commit?](https://stackoverflow.com/q/52215031). — Peter Cordes, Aug 24 '20 at 22:22
@Dan: Also related: [what is a store buffer?](https://stackoverflow.com/q/11105827) / [Size of store buffers on Intel hardware? What exactly is a store buffer?](https://stackoverflow.com/q/54876208) — Peter Cordes, Aug 24 '20 at 22:27
In therms of Visibility, is it possible that a value is updated in order, but it's not visible to another thread yet (i.e cache is not updated yet), even in SC scenario would this be possible? — Dan, Aug 25 '20 at 15:36
Yes; SC is met if the order of updates is uniform; it implies nothing about the latency of updates. — mevets, Aug 25 '20 at 15:39

Understanding the Weak memory model

1 Answers1