Rust SeqCst ordering

Question

I'm trying to understand how Ordering::SeqCst works. For that I have few code examples where this ordering is mandatory for obtaining consistent result. In first example we just want to increment counter variable:

let a: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let b: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let counter: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));

let _thread_a = spawn(move || a.store(true, Ordering::Release));
let _thread_b = spawn(move || b.store(true, Ordering::Release));

let thread_1 = spawn(move || {
    while !a.load(Ordering::Acquire) {} // prevents from reordering everything after

    if b.load(Ordering::Relaxed) { // no need of Acquire due to previous restriction
        counter.fetch_add(1, Ordering::Relaxed);
    }
});
let thread_2 = spawn(move || {
    while !b.load(Ordering::Acquire) {} // prevents from reordering everything after

    if a.load(Ordering::Relaxed) { // no need of Acquire due to previous restriction
        counter.fetch_add(1, Ordering::Relaxed);
    }
});

thread_1.join().unwrap();
thread_2.join().unwrap();
println!("{}", counter.load(Ordering::Relaxed));

Possible values of counter with this example are 1 or 2, depends on thread scheduling. But surprisingly 0 is also possible but I don't understand how.

If thread_1 has started and only a was set to true by _thread_a, counter could will be left untouched after thread_1 will exit.

If thread_2 will start after thread_1, counter will be incremented once, bcs thread_1 has finished (here we know that a is already true), so thread_2 have just to wait for b to become true.

Or if thread_2 will be first and b was set to true, counter will be incremented only once too.

There is also possibility that _thread_a and _thread_b will both run before thread_1 and thread_2 and both of them will increment counter. So that's why 1 and 2 are valid possible outcomes for counter. But as I previously said, there is also a 0 as possible valid result, only if I won't replace all loads and stores for a and b to Ordering::SeqCst:

let _thread_a = spawn(move || a.store(true, Ordering::SeqCst));
let _thread_b = spawn(move || b.store(true, Ordering::SeqCst));

let thread_1 = spawn(move || {
    while !a.load(Ordering::SeqCst) {}

    if b.load(ordering::SeqCst) {
        counter.fetch_add(1, Ordering::Relaxed);
    }
});
let thread_2 = spawn(move || {
    while !b.load(Ordering::SeqCst) {}

    if a.load(ordering::SeqCst) {
        counter.fetch_add(1, Ordering::Relaxed);
    }
});

thread_1.join().unwrap();
thread_2.join().unwrap();
println!("{}", counter.load(Ordering::SeqCst));

Now 0 isn't possible, but I don't know why.

Second example was taken from here:

static A: AtomicBool = AtomicBool::new(false);
static B: AtomicBool = AtomicBool::new(false);

static mut S: String = String::new();

fn main() {
    let a = thread::spawn(|| {
        A.store(true, SeqCst);
        if !B.load(SeqCst) {
            unsafe { S.push('!') };
        }
    });

    let b = thread::spawn(|| {
        B.store(true, SeqCst);
        if !A.load(SeqCst) {
            unsafe { S.push('!') };
        }
    });

    a.join().unwrap();
    b.join().unwrap();
}

Threads a and b could start at same time and modify A and B thus none of them will modify S. Or one of them could start before the other, and modify S, leaving other thread with unmodified S. If I understood correctly, there is no possibility for S to being modified in parallel by both threads? The only reason why Ordering::SeqCst is useful here, to prevent from reordering. But if I will replace all ordering like this:

let a = thread::spawn(|| {
    A.store(true, Release); // nothing can be placed before
    if !B.load(Acquire) { // nothing can be reordered after
        unsafe { S.push('!') };
    }
});
    
let b = thread::spawn(|| {
    B.store(true, Release); // nothing can be placed before
    if !A.load(Acquire) { // nothing can be reordered after
        unsafe { S.push('!') };
    }
});

Isn't it the same as original?

Also Rust docs refers to C++ docs on ordering, where Ordering::SeqCst is described as:

Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged.

What is single total modification order on concrete example?

Nikolay Zakirov · Accepted Answer · 2023-02-08T04:27:25.513

Although Chayim answer is correct, but I think it's still very difficult to wrap one's head around the concept. A useful mind model to remember is that reordering of operations might happen if it is not forbidden, and access different parts of memory from the perspective of the current thread. In rustonomicon two types or reordering are described:

Compiler Reordering - if performance is improved while result is same why not reordering.

Hardware Reordering - this topic is more subtle and apparently rustonomicon reasoning about cache causing reordering is stricly speaking incorrect (thanks a lot to @peter-cordes for pointing that out). CPU caches are always coherent (MESI protocol guarantees invalidation of other copies before committing a write to a cache line). Out-of-order execution is one reason the reordering may happen in the CPU core but it is not the only reason. Here are two more examples of what may cause reordering in the CPU. And here is a detailed explanation of one of the examples (with still a lot of discussion afterwards).

And here is an excellent documentation (Howells, McKenney, Deacon, Zijlstra) touching both hardware and compiler which claims to still be incomplete.

We may use both hardware or compiler reordering for the first example:

let a: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let b: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let counter: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));

1) let _thread_a = spawn(move || a.store(true, Ordering::Release));
2) let _thread_b = spawn(move || b.store(true, Ordering::Release));

let thread_1 = spawn(move || {
    3) while !a.load(Ordering::Acquire) {} // prevents from reordering everything after

    4) if b.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
        counter.fetch_add(1, Ordering::Relaxed);
    }
});
let thread_2 = spawn(move || {
    5) while !b.load(Ordering::Acquire) {} // prevents from reordering everything after

    6) if a.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
        counter.fetch_add(1, Ordering::Relaxed);
    }
});

When thread2 is running on its core it has a and b and operations 5) and 6) in store. 6) is relaxed, 6) and 5) use different memory and don't affect each other, so there is nothing that prevents 6) happening before 5).

As for the second example:

let a = thread::spawn(|| {
    1) A.store(true, Release); // nothing can be placed before
    2) if !B.load(Acquire) { // nothing can be reordered after
        unsafe { S.push('!') };
    }
});
    
let b = thread::spawn(|| {
    3) B.store(true, Release); // nothing can be placed before
    4) if !A.load(Acquire) { // nothing can be reordered after
        unsafe { S.push('!') };
    }
});

there is this similar question talking about it.

If we look at this paper (it is for c++ but you rightly noticed that Rust uses same model), it describes the memory guarantees for Release and Acquire as:

atomic operations on the same object may never be reordered [CSC12, 1.10.19, p. 14],
(non-)atomic write operations that are sequenced before a release operation A may not be reordered after A,
(non-)atomic load operations that are sequenced after an acquire operation A may not be reordered before A.

So in principle it is not forbiden to reorder 1) <-> 2) or 3) <-> 4).

Unfortunately https://doc.rust-lang.org/nomicon/atomics.html repeats a common misconception that cache is responsible for memory reordering. Cache is coherent, using MESI to invalidate other copies *before* committing a write to a cache line. In real CPUs, memory reordering is a local effect in when the pipeline accesses L1d cache: stores only commit from the store buffer after retirement and after getting exclusive ownership of the cache line. Loads as early as possible so the data will be ready. https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ — Peter Cordes, Feb 07 '23 at 16:10
Also see https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/ and [How does memory reordering help processors and compilers?](https://stackoverflow.com/q/37725497). Re: microarchitectural details, see [an explanation of what a store buffer does](https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram). Fun fact: x86's memory model is program order with a store buffer + store forwarding. So only StoreLoad reordering is allowed, plus the effects of store-forwarding. — Peter Cordes, Feb 07 '23 at 16:11
Thanks Peter for pointing out to the mistake and sharing this great sources of information. — Nikolay Zakirov, Feb 08 '23 at 02:37
@NikolayZakirov **atomic operations on the same object may never be reordered** if I understood correctly, `store` operation followed by `load` operation on same object cannot be reordered despite which ordering was applied on both sides? For example in your second example, in first thread, we can replace load of `B` to load of `A` (even if it's not logically correct) and be sure that these 2 operation won't be reordered even if we would apply `Relaxed` ordering on one or both sides or leave `Release/Acquire`? — zexed640, Mar 17 '23 at 09:50

score 0 · Answer 2 · answered Feb 07 '23 at 09:23

0

You seem to misunderstand the basics of reordering.

We do not care what threads started first: it is possible for one thread to observe a result that means thread A was started first, while another thread will observe a result meaning thread B was started first.

In your first snippet, for example, it is possible for thread_1 to see a == true && b == false, yet thread_2 can at the same time see a == false && b == true. As long as there is no happens-before relationship that forces otherwise.

The only thing we care about (putting aside the global modification order) is the happens-before relationship. If point B established a happens-before relationship with point A, then everything before point A (including) will be observable after point B (including). But in snippet 1 we only establish a happens-before relationship with _thread_a for thread_1 and _thread_b for thread_2, leaving our relationship with the other thread that sets the other variable undefined, and therefore, the threads can observe opposite values of the variables. In snippet 2, we may not establish a happens-before relationship at all: we may establish a happens-before relationship between the other thread's store and our load, but our store is not included inside and therefore the other thread may not see it.

With SeqCst, however, all operations take place in the global modification order. This is a virtual list of all SeqCst operations, and each element in the list happens-after all elements before it and happens-before all elements after it. This can be used to establish a happens-before relationship between different variables atomic operations, like in your examples.

answered Feb 07 '23 at 09:23

Chayim Friedman

47,971
5
48
77

If I understood correctly, in second snippet there is no happens-before bcs both `A` and `B` load and store are being placed in different threads? And to achieve happens-before load and store on same atomic must be placed in the very same thread? – zexed640 Mar 12 '23 at 07:58
@zexed640 I'm not sure I understand what you mean, but if I understand you correctly, you are claiming that the only way to create a happens-before relationship is to be on the same thread. If it is what you meant, then no. This is one of the ways to create a happens-before relationship, but there are others, e.g. synchronizing with an atomic load/store. – Chayim Friedman Mar 12 '23 at 09:33
I forgot to clarify that store and load must be at least `Release` / `Acquire` or stronger to achieve happens-before besides it must be in the same thread. If that is so, then that's why there is no guaranty of seeing consistent result of write into `A` or `B` in second sample. But if replace current orderings in second sample with `SeqCst` then, we can be sure that write to `A` in first thread will be seen in second thread? – zexed640 Mar 12 '23 at 11:50
@zexed640 Sorry, I don't understand what you're asking. – Chayim Friedman Mar 12 '23 at 11:52
In your reply you said - `we may establish a happens-before relationship between the other thread's store and our load, but our store is not included inside and therefore the other thread may not see it`. The reason why the other thread may not see result of write, is bcs store and load on the same atomic are being placed in different threads? – zexed640 Mar 12 '23 at 12:11
@zexed640 No. Suppose we established a happens before relationship between the store of B in thread B and the load of B in thread A. That means that the load, and everything after it, will happen after the store and everything before it. But since the store to A in thread A is before its load of B, it is not inside this happens-before and therefore can take place after the load of A from the point of view of thread B and therefore after the load of A in thread B. Same goes in the opposite direction, for the load of A in thread B - it is not in the happens-before, and can be before B's store. – Chayim Friedman Mar 12 '23 at 12:28

Rust SeqCst ordering

2 Answers2