Sequential consistency does not tell you that it will execute 1,2,3,4 at all.
Sequential consistency tells you that if CPU0 is executing 1,2 and CPU1 is executing 3,4; that the CPUs will execute the blocks in that order, and no side effect (memory store) of 2 will be perceivable before those of 1; and no side effect of 4 will be perceivable before 3.
If earlier A=B=0
, then:
Thread 1 Thread 2
======== ========
1) A = 1 3) B = 1
2) Print(A,B) 4) Print(A,B)
All sequential concurrency tells us is that the possible outputs are:
Thread 1 { 1, 0 }, { 1, 1}
Thread 2 { 0, 1 }, { 1, 1}.
If we extend it to an initial state of A=B=C=D=0
Thread 1 Thread 2
======== ========
A = 1 D = 1
C = 1 B = 1
Print(A,B,C,D) Print(A,B,C,D)
Thread1 valid outputs:
1: {1, 0, 1, 0} -- no effects from thread2 seen
2: {1, 0, 1, 1} -- update of D visible; not B
3: {1, 1, 1, 0} -- update of B visible; not D
4: {1, 1, 1, 1} -- update of B and D visible.
Thread2 valid outputs:
5: {0, 1, 0, 1} -- no effects from thread1 seen
6: {0, 1, 1, 1} -- update of C visible; not A
7: {1, 1, 0, 1} -- update of A visible; not C
8: {1, 1, 1, 1} -- update of A and C visible.
In sequential consistency, 1,2,4 : 5,6,8 are possible.
In weaker consistencies, 1,2,3,4 : 5,6,7,8 are possible.
Note that in neither case would the thread fail to see its own updates in order; but the outputs 3,7 result from the threads seeing the other threads updates out of order.
If you require a specific ordering to be maintained, inserting a barrier instruction[1] is the preferred approach. When the cpu encounters a barrier, it affects the either pre-fetched (read barrier), store queue (write barrier) or both (rw barrier).
When there are two memory writes: A = 1; C = 1;
you can install write barriers as membar w; store A; store C
. This ensures that all stores before the store to A will be seen before either the store to A or C; but enforces no ordering between A and C.
You can install them as store A; membar w; store C
which ensure that the store of A will be seen before C; and store A; store C; membar w
ensures that A and C will be seen before any subsequent stores.
So which barrier or barrier combination is right for your case?
[1] more modern architectures incorporate barriers into the load and store instructions themselves; so you might have a store.sc A; store C;
. The advantage here is to limit the scope of the store barrier so that the store unit only has to serialize these stores, rather than suffer the latency of the entire queue.