Reordering and memory_order_relaxed

Question

Cppreference gives the following example about memory_order_relaxed:

Atomic operations tagged memory_order_relaxed are not synchronization operations, they do not order memory. They only guarantee atomicity and modification order consistency.

Then explains that, with x and y initially zero, this example code

// Thread 1:
r1 = y.load(memory_order_relaxed); // A
x.store(r1, memory_order_relaxed); // B

// Thread 2:
r2 = x.load(memory_order_relaxed); // C 
y.store(42, memory_order_relaxed); // D

is allowed to produce r1 == r2 == 42 because:

Although A is sequenced-before B within thread 1 and C is sequenced-before D in thread 2,
Nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x.

Now my question is: if A and B can't be reordered within thread 1 and, similarly, C and D within thread 2 (since each of those is sequenced-before within its thread), aren't points 1 and 2 in contradiction? In other words, with no reordering (as point 1 seems to require), how is the scenario in point 2, visualized below, even possible?

T1 ........... T2

.............. D(y)

A(y)

B(x)

.............. C(x)

Because in this case C would not be sequenced-before D within thread 2, as point 1 demands.

I would recommend to look here: http://stackoverflow.com/questions/6319146/c11-introduced-a-standardized-memory-model-what-does-it-mean-and-how-is-it-g, especially the answer about the analogy with the special theory of relativity. My understanding of this is that with relaxed atomic, there is nothing as a global time, therefore you cannot simply visualize the actions in some global-time chart. Particularly, each thread can have, at a particular time, a different view of memory. — Daniel Langr, Feb 26 '16 at 11:52

Cubbi · Accepted Answer · 2016-02-26T14:39:17.947

4

with no reordering (as point 1 seems to require)

Point 1 does not mean "no reordering". It means sequencing of events within a thread of execution. The compiler will issue the CPU instruction for A before B and the CPU instruction for C before D (although even that may be subverted by the as-if rule), but the CPU has no obligation to execute them in that order, caches/write buffers/invalidation queues have no obligation to propagate them in that order, and memory has no obligation to be uniform.

(individual architectures may offer those guarantees though)

edited Feb 26 '16 at 14:39

answered Feb 26 '16 at 14:33

Cubbi

46,567
13
103
169

So, _sequenced-before_ applies to just the order of corresponding instructions issued _by the compiler_, which the CPU _can_ reorder in a relaxed memory model? – Leo Heinsaar Feb 26 '16 at 16:15
The CPU can reorder everything, but fence instructions (and data depencies as between A and B) limit what it can do. – Cubbi Feb 26 '16 at 16:47
The cpu can't reorder *everything*, there are rules per architecture (and sometimes different mem/operation types within the same architecture) saying what can be reordered. – Leeor Mar 15 '16 at 22:49
@Leeor that's what I meant by individual architectures offering those (ordering) guarantees – Cubbi Mar 15 '16 at 22:55
What is a std::something_fence() way to stop CPU reordering in same thread? – huseyin tugrul buyukisik Jan 13 '18 at 17:33

Amit · Answer 2 · 2016-02-26T12:26:56.193

Your interpretation of the text is wrong. Let's break this down:

Atomic operations tagged memory_order_relaxed are not synchronization operations, they do not order memory

This means that these operations make no guarantees regarding the order of events. As explained prior to that statement in the original text, multi threaded processors are allowed to reorder operations within a single thread. This can affect the write, the read or both. Additionally, the compiler is allowed to do the same thing at compile time (mostly for optimization purposes). To see how this relates to the example, suppose we don't use atomic types at all, but we do use primitive types that are atomic be design (an 8 bit value...). Let's rewrite the example:

// Somewhere...
uint8_t y, x;

// Thread 1:
uint8_t r1 = y; // A
x = r1; // B

// Thread 2:
uint8_t r2 = x; // C 
y = 42; // D

Considering both the compiler, and the CPU are allowed to reorder operations in each thread, it's easy to see how x == y == 42 is possible.

The next part of the statement is:

They only guarantee atomicity and modification order consistency.

This means the only guarantee is that each operation is atomic, that is, is impossible for an operation to be observed "mid way though". What this means is that if x is an atomic<someComplexType>, it's impossible for one thread to observe x as having a value in between states.

It should already be clear where can that be useful, but let's examine a specific example (for demonstration proposes only, this is not how you'd want to code):

class SomeComplexType {
  public:
    int size;
    int *values;
}

// Thread 1:
SomeComplexType r = x.load(memory_order_relaxed);
if(r.size > 3)
  r.values[2] = 123;

// Thread 2:
SomeComplexType a, b;
a.size = 10; a.values = new int[10];
b.size = 0; b.values = NULL;
x.store(a, memory_order_relaxed);
x.store(b, memory_order_relaxed);

What the atomic type does for us is guarantee that r in thread 1 is not an object in between states, specifically, that it's size & values properties are in sync.

Does not the clause _"A is sequenced-before B within thread 1 and C is sequenced-before D in thread 2"_ mean that **operations within threads are not reordered**? — Daniel Langr, Feb 26 '16 at 13:35
@DanielLangr withing threads - yes, but CPU and caches both reorder memory accesses further — Cubbi, Feb 26 '16 at 14:25
@Cubbi _If A is sequenced before B, then evaluation of A will be complete before evaluation of B begins._ So, what does this rule from cppreference.com exactly mean? Does it allow CPU reordering, i.e., at runtime, may B be evaluated before A? — Daniel Langr, Feb 26 '16 at 14:31
@DanielLangr yes, that's the point of relaxed memory model. I'll post it as answer since that seems to be the crucial point here. — Cubbi, Feb 26 '16 at 14:34
@Cubbi Thanks, I've found an explanation here: http://stackoverflow.com/questions/25287484/does-sequenced-before-relation-in-c11-prevent-compiler-cpu-reordering — Daniel Langr, Feb 26 '16 at 14:35

score 1 · Answer 3 · edited May 23 '17 at 12:07

According to the STR analogy from this post: C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?, I've created a visualization of what can happen here (as I understand it) as follows:

Thread 1 first sees y=42, then it performs r1=y, and after it x=r1. Thread 2 first sees x=r1 being already 42, then it performs r2=x, and after it y=42.

Lines represent "views" of memory by individual threads. These lines/views cannot cross for a particular thread. But, with relaxed atomics, lines/views of one thread can cross these of other threads.

EDIT:

I guess this is the same as with the following program:

atomic<int> x{0}, y{0};

// thread 1:
x.store(1, memory_order_relaxed);
cout << x.load(memory_order_relaxed) << y.load(memory_order_relaxed);

// thread 2:
y.store(1, memory_order_relaxed);
cout << x.load(memory_order_relaxed) << y.load(memory_order_relaxed);

which can produce 01 and 10 on the output (such an output could not happen with SC atomic operations).

Carlo Wood · Answer 4 · 2018-02-18T20:50:20.463

Looking exclusively at the C++ memory model (not talking about compiler or hardware reordering), the only execution that leads to r1=r2=42 is:

Here I replaced r1 with a and r2 with b. As usual, sb stands for sequenced-before and is simply the inter-thread ordering (the order in which the instructions appear in the source code). The rf are Read-From edges and mean that the Read/load on one end reads the value written/Stored on the other end.

The loop, involving both sb and rf edges, as highlighted in green, is necessary for the outcome: y is written in one thread, which is read in the other thread into a and from there written to x, which is read in the former thread again into b (which is sequenced-before the write to y).

There are two reasons why a constructed graph like this would not be possible: causality and because a rf reads a hidden side effect. In this case the latter is impossible because we only write once to each variable, so clearly one write can not be hidden (overwritten) by another write.

In order to answer the causality question we follow the following rule: A loop is disallowed (impossible) when it involves a single memory location and the direction of the sb edges is in the same direction everywhere in the loop (the direction of the rf edges is not relevant in that case); or, the loop involves more than one variable, all edges (sb AND rf) are in the same direction and AT MOST one of the variables has one or more rf edges between different threads that are not release/acquire.

In this case the loop exists, two variables are involved (one rf edge for x and one rf edge for y), all edges are in the same direction, but TWO variables have a relaxed/relaxed rf edge (namely x and y). Therefore there is no causality violation and this is an execution that is consistent with the C++ memory model.

Reordering and memory_order_relaxed

4 Answers4