2
#include <atomic>

std::atomic<int> foo;

void ThreadA()
{
    foo.store(1, std::memory_order_relaxed);

    while (true) {};
}

void ThreadB()
{
    while (foo.load(std::memory_order_relaxed) == 0)
    {
    }
}

For atomic variables that use relaxed operations, is it theoretically possible that thread B will never be able to read the latest value of the foo variable (assuming that there is no interference from other threads refreshing the cache)?

Or do we have any guarantee, whether it is hardware, operating system, or C++ standard, thread B can read the latest foo variable value in a finite time?

lostyzd
  • 4,515
  • 3
  • 19
  • 33
  • Hm...I personally think it is possible. because if store is reordered after load instruction and jump that creates loop, the problem can occur. – Afshin Jul 10 '21 at 05:13
  • @Afshin If the `foo` variable stays in the CPU register and there is no cache flushing or synchronization operation, in this case, `thread B` cannot observe the modification of the `foo` variable? – lostyzd Jul 10 '21 at 06:45
  • __Relaxed ordering:__ _"...Atomic operations tagged memory_order_relaxed are __not synchronization operations__; they do not impose an order among concurrent memory accesses. They only guarantee atomicity and modification order consistency...."_ https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering – Richard Critten Jul 10 '21 at 09:48
  • 2
    "An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time." – Marc Glisse Jul 14 '21 at 20:05

3 Answers3

3

This has been effectively guaranteed since C++11, with all quotes from the C++20 standard. First there is [intro.races]/4 wihch states

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M.

and then later on, in paragraphs 15 to 19

  1. If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A shall be earlier than B in the modification order of M. [Note 15: This requirement is known as write-write coherence. — end note]

  2. If a value computation A of an atomic object M happens before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B shall either be the value stored by X or the value stored by a side effect Y on M, where Y follows X in the modification order of M. [Note 16: This requirement is known as read-read coherence. — end note]

  3. If a value computation A of an atomic object M happens before an operation B that modifies M, then A shall take its value from a side effect X on M, where X precedes B in the modification order of M. [Note 17: This requirement is known as read-write coherence. — end note]

  4. If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M. [Note 18: This requirement is known as write-read coherence. — end note]

  5. [Note 19: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations. — end note]

With the note in paragraph 19 summarizing it best: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • 2 questions: 1- how compiler decided which instruction happens before another instruction in another thread? I mean if compiler assumes store is after load, then he will not reorder it based on your post, but it is still incorrect order and while will never finish. 2- This post does not say anything about reordering in CPU pipeline which may happen. it is about reordering in compiler. – Afshin Jul 10 '21 at 05:52
  • @Afshin The C++ standard defines an abstract machine. It's the compilers job to "make" that machine to run your code against. `ThreadA` has `foo.store(1, std::memory_order_relaxed);`, and the standard guarantees that at some point that operation is going to happen, and after it does, the side affects will be visible to all other threads. – NathanOliver Jul 10 '21 at 14:04
  • I have seen for problems similar to OP's, people normally use `released` memory order for storing and `acquire` for loading. here is an example: https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering .Then these memory orders are for limiting **when** side effects will be available for other threads? – Afshin Jul 10 '21 at 14:22
  • The question seems to be about a misconception that cache isn't coherent. C++ doesn't actually require that, though. It's not about reordering on the same access, it's about not being able to do infinitely many atomic loads that never see the value. ISO C++ says: *An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a **finite** period of time [intro.progress](https://eel.is/c++draft/intro.multithread#intro.progress-18.sentence-1)*. Also a "reasonable" time for atomic stores. – Peter Cordes Mar 30 '22 at 19:18
  • To put another way, these guarantees tell you that if memory location A has values A1, A2, A3 (in that order, the *modification order* for A), then a reader that sees A2 in one load can only load A2 or A3 in future loads, not A1. But these coherency rules don't rule out seeing A2 for an indefinite number of future loads. (The other rules about "finite" and "reasonable" time are necessary for that, as mentioned in my previous comment.) Real HW and OS scheduler fairness provides much stronger performance guarantees for average and worst cases. – Peter Cordes Apr 05 '22 at 03:42
  • See also [MESI Protocol & std::atomic - Does it ensure all writes are immediately visible to other threads?](https://stackoverflow.com/q/60292095) / [Is a memory barrier required to read a value that is atomically modified?](https://stackoverflow.com/q/71718224) – Peter Cordes Apr 05 '22 at 03:45
1

Presuming that ThreadA() and ThreadB() are actually executed concurrently in the first place, then the C++ standard would indeed theoretically guarantee that ThreadB() will eventially read the change to the atomic and terminate properly.

In reality, things can get a bit more messy. I can think of three scenarios where ThreadB() will never see the change to the atomic:

  1. Something outside of threads A and B terminates thread B prematurely, or stalls it indefinitely. Then obviously it can't get to the point where it sees the change.

  2. Something outside of threads A and B terminates thread A prematurely, or stalls it indefinitely. Then obviously it can't get its message out, and thread B will never see it.

  3. Something outside of threads A and B disrupts communication between whatever hardware entities are running the respective threads.

3 is, fortunately, a rather hypothetical scenario for pieces of software, but in e.g. a supercomputer environment, I can imagine that it may be worth hardening software against such scenarios if they cannot be robustly handled at a hardware level.

1 and 2 may actually happen even on single-core machines. For instance, the user might pause the program using a debugger, suspend one thread, unpause the others, and then walk away from the keyboard to buy a pack of cigarettes, never to return. Or maybe some anti-virus software goes haywire and decides that thread A is a security threat, and that it needs to be terminated, but for some obscure reason does not terminate thread B.

This may sound trivial, but in a more problematic variant of scenario 2, maybe something in your program causes the operating system to run low on resources until thread B completes. Then the lack of resources could potentially cause the operating system to stall thread A because it thinks it has more important tasks to throw its sparse resources at, which in turn would prevent the operating system to recover.

(A similar thing might conceivably happen with scenario 1, though it is a bit more boring.)

None of these scenarios are sensitive to the memory order though.

Christoph Lipka
  • 652
  • 4
  • 15
0

Under normal conditions, the operating system will make sure that each thread gets its fair share of CPU time. This means that yes, barring exceptional circumstances in which you'd probably have other worries, thread B will eventually get a chance to read the change to the atomic variable. This is regardless of memory order.

It is also guaranteed by the C++ standard that thread B will read the atomic variable over and over again (as opposed to just read it once and cache the result) until the change to the atomic value has been submitted by thread A and propagated to whatever entity is executing thread B. Again, regardless of memory order.

Christoph Lipka
  • 652
  • 4
  • 15