What occurs when 3 "stores" happen sequentially and only one is atomic

Question

I tried to boil this down to a simple example for the sake of clarity. I have an atomic flag of sorts that is used to indicate that one thing just completed and another has not yet started. Both of those things involve storing data in a buffer. I'm trying to figure out how rusts Release ordering works specifically in order to understand how to do this. Consider the "very oversimplified" example:

use std::sync::atomic::{AtomicU32,Ordering};

fn main(){
    let mut a = 0;
    let mut b = AtomicU32::new(0);
    let mut c = 0;

    // stuff happens

    a = 10;
    b.store(11,Ordering::Release);
    c = 11;
}

In particular, it is imperative to maintain a type invariant that the atomic store to variable b happens after a and before c, but neither of those variables or their store operations can be atomic in reality (yes, in the example they can be, but this is for simplification/visualization). I would like to avoid a mutex if I can (I don't want to detract from the question with why).

When I read up on Release ordering, it indicates strongly that the assignment to variable "a" would have to occur before the store to b:

When coupled with a store, all previous operations become ordered before any load of this value with Acquire (or stronger) ordering. In particular, all previous writes become visible to all threads that perform an Acquire (or stronger) load of this value. Notice that using this ordering for an operation that combines loads and stores leads to a Relaxed load operation! This ordering is only applicable for operations that can perform a store. Corresponds to memory_order_release in C++20.

However, it makes no guarantee that the assignment to variable c could not be moved before the store to variable b. Almost everything I read always says that stores/loads before the atomic operation are guaranteed to happen before but makes no guarantees about moving operations in the other direction across the boundary.

Am I correct in worrying that the assignment to variable c could be moved before the store to b if Release ordering is used?

I looked at other questions such as Which std::sync::atomic::Ordering to use? and other similar stack overflow questions, but they don't cover whether or not c can be moved before b using release as far as I can see.

So... The store to `b` is a release with respect to `a`, sure, but it is an *acquire* with respect to `c`, right? Isn't `AcqRel` the logical thing to use? (I don't know what I'm talking about, but it sounds right) — trent, Mar 15 '20 at 00:07
Yes you're correct to worry. In practice in C++ std::atomic, `atomic_thread_fence(std::memory_order_release)` before the `c=0` store would effectively give it release semantics. That might just be an implementation detail, and I don't know Rust, but it is how you can trick a C++ compiler into making correct code for a SeqLock using plain or `volatile` data (not `atomic<>`) - [Implementing 64 bit atomic counter with 32 bit atomics](https://stackoverflow.com/q/54611003) also uses the same trick for load-load ordering. — Peter Cordes, Mar 15 '20 at 00:32
@trentcl I'm not sure if it is an Acquire or not to be honest. I am not super experienced with this type of stuff. I do know that the rust store() operation will not allow AcqRel as a possible ordering — Jere, Mar 15 '20 at 02:23
@PeterCordes I think that might work based on this discussion: https://preshing.com/20130922/acquire-and-release-fences/ . That discussion suggests that a release fence is a stronger contract and would not allow c to be moved before b. Rust's documentation directly says it orderings mimic the c++ orderings, so if that is the case, that might be the answer to my problem. I'm not super fluent in Rust yet though, so I might also give some time for any Rust developers to comment just in case I am misreading something. — Jere, Mar 15 '20 at 02:27
The question is whether non-atomic accesses always have to fully respect barriers at compile time. If so, they yes it's fine because yes like Preshing explains, C++ atomic fences are bidirectional barriers (unlike release *operations* on a specific atomic object). It can be hard to keep straight what's an implementation detail of real compilers (e.g. it's definitely safe in current GCC and clang/LLVM), vs. what the pure language standard guarantees. (Basically nothing outside of Synchronizes-with creating happens before relationships) There was even a GCC bug so my memory is extra confused. — Peter Cordes, Mar 15 '20 at 02:37
@PeterCordes that's a good point. Another reason to see is a seasoned Rust programmer can specify if that is the case. I'm not familiar with Rust enough to know myself. — Jere, Mar 15 '20 at 02:56
The number of people in the world who *really, truly* understand memory orderings is probably (spitballing here) on the order of a few hundred to a thousand. The intersection of that set with the set of Rust users on Stack Overflow is much smaller (and possibly even nonexistent). You may want to try translating this question into C or C++ and asking it again; it would get more exposure to the kind of people who are likely to know the answer. You might also try users.rust-lang.org; some experts that post there do not come here (and vice versa). — trent, Mar 15 '20 at 10:08
I think you're missing what's going on in the other thread (the one that will try to observe the values of `b` and `c`). How does it load `b`? Acquire-release semantics are only meaningful when paired. — trent, Mar 15 '20 at 10:21
@trentcl I use a b.load(Ordering::Acquire) on the other end. — Jere, Mar 15 '20 at 16:20
@PeterCordes your first response coupled with the note in your 2nd response is enough to answer my initial question. Do you want to formulate it into an answer to accept or would you prefer me to create the answer and reference you in it? Or do I need to update the question to facilitate a better answer? — Jere, Mar 15 '20 at 16:23
It also matters how you access `a` and `c` in the other thread. It may be best to write a small, self-contained example program and ask "Is this program *guaranteed* to have such and such behavior?" rather than asking about the order in which stores "really" happen (in reality, there often is no globally consistent ordering of events... if that's what you need, I think `SeqCst` is the only way to guarantee it). — trent, Mar 15 '20 at 16:41
@trentcl: re: global total order: Depends what you mean by "often". In practice on real systems, AFAIK only PowerPC creates IRIW reordering in practice, where threads can disagree about the order in which 2 independent stores were done. [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](//stackoverflow.com/a/50679223). A few other HW memory models allow it on paper, like ARM before ARMv8, but never did it in practice. But if you mean whether most software is written to actually guarantee it on *every* platform, maybe. — Peter Cordes, Mar 15 '20 at 17:53
@Jere: as trentcl says, if you want to know about formal language guarantees instead of just happening to get the right asm for non-atomic operations, Rust is probably like C++, but IDK. How are other threads going to depend on this fact without doing anything unsafe? Or does Rust not have UB, making it stronger than C++ / closer to asm for real hardware? (Coherent shared memory and no HW race detection for example are things that ISO C++ on purpose avoids assuming.) — Peter Cordes, Mar 15 '20 at 18:00
@trentcl I am not allowed to go into super detail here, but a and c are not referenced in any other threads. This is mostly single threaded (another thread monitors b for an unrelated issue), but I have to ensure the order of those events within that single thread in order to avoid erroneous memory access in the presence of a panic!() (something similar but not exactly like an exception). If either the compiler or the processor reorders the flag setting before a or after c, then undefined behavior can occur during the cleanup of a panic. — Jere, Mar 15 '20 at 19:57
@PeterCordes Rust has two sides: safe and unsafe. While using the safe subset, you are guaranteed to avoid UB both in single threads and multiple. The operations surrounding the variable b are all using safe Rust so there will be no UB there. However, I am using "unsafe" in a single threaded context on both a and c as they are not shared across any threads. There is no unsafe code shared across variables (it is all self contained). My main concern is to make sure a panic (sort of like an exception) happening coupled with reordered instructions does not lead to UB (single threaded). — Jere, Mar 15 '20 at 20:05
@PeterCordes ***continued*** Think of b as a system state variable that has two jobs: 1) allow the outside world to loosely poll the system state and 2) ensure that when the object is destructed that it does the proper amount of cleanup (how much and what type of cleanup is based on the value of b). If b gets reordered before a or after c the cleanup code will assume the wrong state of the overall object and potentially access uninitialized memory. I am trying to enforce the order of a, b, and c to avoid this. B just happens to be atomic because of its other job, — Jere, Mar 15 '20 at 20:14
@PeterCordes ***continued*** I was hoping that the atomic operations (or potentially fences if not) would provide that, but I wasn't sure about if the "Release" ordering blocked one way or two ways and if only one way how to ensure the order of the 3 statements. I was less using them for thread safety and more hoping to leverage their side effects to ensure instruction sequence. — Jere, Mar 15 '20 at 20:16
@PeterCordes but I think you answered my original question. here is my interpretation: The Release ordering does not prevent the CPU from reordering c before b, but will prevent a after b. An additional Release fence between b and c would be needed. Noting however, it may not prevent the compiler from reordering, just the CPU. Is that a correct interpretation of your comments? If so, let me know how you want an answer to be generated (would you prefer me to make the answer and self solve or a different method). — Jere, Mar 15 '20 at 20:21
If a and c are not referenced in other threads, my understanding is that `Release` ordering has no effect; in other words, the ordering may as well be `Relaxed` (note this is not the same as non-atomic). The order of events observed by a *single* thread will always be self-consistent, so if that thread panics, it still can't observe the store to `c` before the store to `b`. Memory orderings allow you to specify when *different* threads acting on the same memory can agree on an ordering. — trent, Mar 15 '20 at 21:19
I will take this opportunity to *once again* complain about Stack Overflow's magic "move this conversation to chat" link, which never appears when appropriate (such as this conversation) and only shows up for conversations that have already died down. — trent, Mar 15 '20 at 21:21

score 0 · Accepted Answer · answered Mar 19 '20 at 22:12

In answer to my own question: Yes, I should be worried that the assignment to C could be reordered before B as the "Release" ordering only prevents A being moved past B. By placing a fence with Release ordering between the assignments to B and C, I can further prevent C from being reordered before B (since it will prevent B after C, which is the same thing).

That all applies to the CPU store/load ordering. Whether or not the atomic store and the fence prevent the compiler from also moving those operations depends on the compiler and its documentation should be consulted.

What occurs when 3 "stores" happen sequentially and only one is atomic

1 Answers1