2
std::atomic<bool> b;

void f()
{
    // block A
    if(b.load(std::memory_order_relaxed))
    {
        // block B
    }
    // block C
}

void g()
{
    // block B
    b.store(true, std::memory_order_release);
}

Theoretically block B should only be executed if the atomic load returns true,
but is it possible that part of block B can get reordered before the load? store with release memory order guarantees that all operations on block B are a visible side effect, but does that still apply if the load is a relaxed operation?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
kiroma
  • 101
  • 7
  • 3
    AFAIK only the `load` itself is guaranteed to be atomic. The rest can still be reordered etc. – Hatted Rooster Jul 01 '19 at 12:44
  • 6
    I don't see how the title of the post corresponds to its body. – Evg Jul 01 '19 at 12:45
  • Possible duplicate of [Memory model ordering and visibility?](https://stackoverflow.com/questions/7461484/memory-model-ordering-and-visibility) – Mgetz Jul 01 '19 at 14:11
  • 1
    @Evg Sorry, I'm not very good at shortly describing technical problems. Can you propose a better title? – kiroma Jul 01 '19 at 15:52
  • 1
    Well, you have no `try_lock` and no `mutex`. I guess you're asking whether you can _simulate_ those in a lockless manner using `atomic`. In which case the title corresponds fine, if slightly unclearly. – Lightness Races in Orbit Jul 01 '19 at 16:09
  • 2
    I have removed the confusing mention of mutexes in the title. If you bring it back, please add a section on the applicability of your Q to mutexes. – curiousguy Jul 01 '19 at 19:10
  • Maybe you could have had only one "block B" section of code, instead of two. – curiousguy Aug 11 '19 at 09:32

3 Answers3

3

Doing a relaxed load before trying to lock is recommended by Intel in Benefitting Power and Performance Sleep Loops:

ATTEMPT_AGAIN:
    if (!acquire_lock())
    {
        /* Spin on pause max_spin_count times before backing off to sleep */
        for(int j = 0; j < max_spin_count; ++j)
        {
            /* pause intrinsic */
            _mm_pause();
            if (read_volatile_lock()) // <--- relaxed load
            {
                if (acquire_lock())
                {
                    goto PROTECTED_CODE;
                }
            }
        }
        /* Pause loop didn't work, sleep now */
        Sleep(0);
        goto ATTEMPT_AGAIN;
    }
PROTECTED_CODE:
    get_work();
    release_lock();
    do_work();

acquire_lock uses acquire sematics so that the relaxed load doesn't get reordered past acquire_lock.

Note, however, it first tries to lock unconditionally before doing busy-wait loop with the relaxed load.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • 1
    Ah, so you can perform a relaxed operation to see if the atomic was written to and if so load again with acquire order to prevent reordering before? That's very helpful, thank you. – kiroma Jul 01 '19 at 15:54
3

You have two block Bs in your example. I'm talking about the one in the void f() load function.

is it possible that part of block B can get reordered before the load?

Yes. The compiler could hoist loads out of the if() body and do them before the b.load. This is likely to happen if both block B and C read the same non-atomic variable.

And there are real-life mechanisms that will create this reordering even without compile-time reordering:

Specifically, branch speculation (i.e. branch prediction + out-of-order speculative execution) will let the CPU start executing block B before the b.load() even starts.

You can't depend on "causality" or any kind of reasoning like "it would have to know the b.load() result before it can know what to execute next".

Or a compiler could potentially do if-conversion of the if() into branchless code, if there aren't any stores in block B. Then it could pretty obviously reorder with non-atomic loads, or other relaxed or acquire loads, that were in block B and C.

(Remember acq/rel are one-way barriers.)


Reasoning like this (based on what real compilers and CPUs can do) can be useful to prove that something isn't safe. But be careful of going the other way: reasoning based on "safe on the compiler I know about" doesn't always mean "safe in portable ISO C++".

Sometimes "safe on the compiler I know about" is more or less sufficient, but it's hard to separate that from "happens to work on the compile I know about", where a future compiler version or a seemingly unrelated source change could break something.

So always try to reason about memory ordering in terms of the C++ memory model, as well as in terms of how it can compile efficiently for an ISA you care about (e.g. strongly-ordered x86). Like you might see that relaxed would allow a compile-time reordering that's actually useful in your case.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
1

The principle thing you should be concerned about is access to the resource you're locking with this "mutex". Without acquire/release semantics, your thread may not see changes to that resource made by the other thread. That is, your reading from that data and the other thread's writing to it constitutes a data race without acquire/release semantics.

You should only use relaxed memory orders if all you want to do is access the atomic value itself, without any question of what else is going on in the world relative to that atomic's value.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Relaxed semantics are fundamentally unsound as they can show a value from an unspecified "future". (Well the whole MT semantics seem unsound too, but that part is more patently unsound.) – curiousguy Jul 01 '19 at 19:12
  • 2
    @curiousguy: would you stop fearmongering about `mo_relaxed`? Yes, various compile-time optimizations can reorder a relaxed load after other things in the source. So can some theoretical runtime effects, like value-prediction which no actual CPUs do, and are unlikely to any time soon. `mo_relaxed` isn't fundamentally unsound, it's just not appropriate if you care about ordering wrt. anything else (unless reordering is bounded by nearby fences or acq / rel operations). A release-store does force a relaxed load to happen before the store. – Peter Cordes Jul 01 '19 at 23:36
  • @Nicol: In C++ terminology, "data race" has a technical meaning of a race condition on non-atomic variables. My understanding is that a garden-variety non-UB race on relaxed atomics shouldn't be called a "data race", just a "race condition". Or if the program only works for one winner but not the other, it could be a "race bug". It's not UB in C++11 for two unsynchronized threads to both use write-only stores on an atomic variable, and the value you find will depend on which happened to go last. (Relaxed, release, or seq-cst doesn't matter for that.) – Peter Cordes Jul 01 '19 at 23:40
  • 1
    @PeterCordes: But that's not what I'm talking about. Whatever "block B" does, if this is indeed intended to be some form of "mutex" (which was the OP's original phrasing before Curiousguy changed it), then that means there's some other thread which has accessed some memory, and the atomic fetch is intended to prevent simultaneous access, to communicate that the writing thread is done and the reading thread can read the value. If you only use a `relaxed` memory order, then the reads in "block B" *will* cause a data race with the other thread. That's what the mutex is supposed to prevent. – Nicol Bolas Jul 01 '19 at 23:42
  • Oops, yeah I was thinking of the atomic itself, or of other atomic data in block B (which would make sense if the producer doesn't have any mechanism for the consumer to signal that it's ready for another update, so you can't guarantee that it isn't already doing more updates after you see one flag). – Peter Cordes Jul 01 '19 at 23:51
  • @PeterCordes No, the whole C++ std makes no sense, as I explained in a now deleted Q (deleted because it raised a serious issue). You can't have "time travel" and step by step semantic and UB in one PL. It makes no sense. It isn't just relaxed that is broken, it's the whole idea of "reordering". – curiousguy Jul 02 '19 at 00:35
  • 1
    @PeterCordes: Please don't reply to curiousguy. It's best to just leave him be; he will post comments on pretty much anything in the language-lawyer tag about C++ just to promote his various bugbears about the C++ standard, and he will defend them ad nausium with no actual citations from the standard. I've learned that it's best to ignore him. Or at least, if you *have* to reply to him, please take it somewhere that I won't be pinged for it. – Nicol Bolas Jul 02 '19 at 00:47
  • 1
    @curiousguy: The way C++ defines its memory model (in terms of synchronizes-with and release sequences) is very different from how hardware ISAs define theirs (in terms of allowed reorderings for stores from the same core, or from different observers on different cores (IRIW), and stuff like that.) So it can get pretty mind-bending to keep straight in the abstract, instead of just in terms of how a specific C++ implementation maps different orderings of loads, stores, and RMW to instructions / barriers on a given ISA. – Peter Cordes Jul 02 '19 at 00:48
  • 1
    @curiousguy: single-step behaviour of C++ statements in the C++ abstract machine is *not* an observable side-effect that the as-if rule needs to preserve. That's why compiler optimization and runtime reordering is allowed to destroy it even in a program with no UB. Or to put another way, why compilers don't need to strengthen everything to seq-cst. (\@Nicol: ok, sorry for the noise. Stopping here.) If you don't understand how to use weaker memory orderings, don't use them. seq-cst is recommended by many C++ people unless you *need* the performance you can get from weakening it. – Peter Cordes Jul 02 '19 at 00:52
  • @PeterCordes no, what I mean is that nothing is safe; you can't destroy an atomic variable, or perhaps even construct it (neither have a memory order parameter). No MT program is defined because of the mixture of UB and non CS semantics. Ppl dance around the issue like the dance around every one of my Q on this site, when they don't remove them (happened twice!). – curiousguy Jul 02 '19 at 01:07
  • 3
    @curiousguy Even though I don't agree with your analysis regarding relaxed atomics, I don't think the question you keep referring to should have been removed. People have the right to ask questions, even when that question is challenging the standard and therefore considered controversial by many. I don't know why it was removed (close votes or other), but perhaps you can ask a moderator to get it back online. – LWimsey Jul 02 '19 at 02:06
  • 2
    @LWimsey: I can't state why it was deleted, but it was closed because it was incredibly broad, essentially boiling down to "explain and justify the C++ memory model." In the spec, the threading memory model consists of 5 pages, and that's without looking at the various stuff in the standard library which adds to the page count. And that doesn't include the various bizarre and inaccurate claims about the spec. Overall, the question was not worded like a search for truth but more like an attempt to spread FUD; it was very much a "when did you stop beating your wife?" kind of question. – Nicol Bolas Jul 02 '19 at 02:15
  • @NicolBolas Now the wording was bad? Why can't anyone explain how to prove any MT (even single threaded!) program to be correct: how to reason about a program. It was a simple Q. Not even the whole std would need to be covered, just anything not specified in term of SC. Your "FUD" accusation is just, well, FUD. – curiousguy Jul 02 '19 at 02:36
  • 2
    @NicolBolas I have to admit, I don't recall the actual details of that question, but it seemed to me like someone deleted it in an emotional response (which I think is a bad reason). But I agree that there are certainly cases that justify questions to be closed and/or deleted. – LWimsey Jul 02 '19 at 02:39
  • 1
    @curiousguy: like I commented earlier, reordering is bounded by synchronizes-with relationships. For a new thread, its point of creation is (I think) a point where it synchronizes with its creator. Atomic constructors don't need to be atomic *themselves* because any way of passing the address to another thread establishes a happens-before relationship where the construction is complete before anyone else can look at it. Remember that reordering is only from the POV of *other* threads; a single thread always sees its own operations in program order. It's trivial to write a simple MT with no UB – Peter Cordes Jul 02 '19 at 21:26
  • (Sorry for the extra noise, Nicol. I'm really done here after this broad statement that the C++ rules really do fit together coherently to allow reasoning about relaxed atomics and MT programs, and/or using acq/rel atomics to allow safe synchronization between different threads that read/write non-atomic variables without UB. This isn't the place to explain *why* in any detail but I couldn't let such a broad conspiracy-theory type claim stand unchallenged.) – Peter Cordes Jul 02 '19 at 21:28