8

Possible Duplicate:
Concurrency: Atomic and volatile in C++11 memory model

With the C++11 <atomic> specification, is there any guarantee of freshness? The descriptions of different memory orders only deal with reorderings (as far as I've seen).

Specifically, in this situation:

#include <atomic>

std::atomic<int> cancel_work(0);

// Thread 1 is executing this function
void thread1_func() {

    ...

    while (cancel_work.load(<some memory order>) == 0) {
        ...do work...
    }
}


// Thread 2 executes this function
void thread2_func() {

    ...

    cancel_work.store(1, <some memory order>);

    ...

}

If thread 1 and thread 2 do not share any other data except cancel_work, it seems to me that any ordering guarantees are not needed and std::memory_order_relax suffices for both the store and the load. But does this guarantee that thread 1 will ever see the update of cancel_work instead of just repeatedly reading its local cache line without ever refreshing it from main memory? If not, what is the minimum needed to make that guarantee?

Community
  • 1
  • 1
JanKanis
  • 6,346
  • 5
  • 38
  • 42
  • 1
    @UmNyobe The reason I didn't find that question before asking mine is that it appears to be about volatiles. The answer is a duplicate, but the question is imo not as this one will be found by people looking for something different than the "atomic and volatile" question you mention. – JanKanis Feb 04 '13 at 14:00

3 Answers3

6

There is nothing that will guarantee that: everything is about ordering. Even memory_order_seq_cst just guarantees that things happen in a single total order. In theory, the compiler/library/cpu could schedule every load from cancel_store at the end of the program.

There is a general statement in 29.3p13 that

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

But there is no specification on what constitutes a "reasonable amount of time".

So: memory_order_relaxed should be just fine, but memory_order_seq_cst may work better on some platforms, as the cache line may be reloaded sooner.

Anthony Williams
  • 66,628
  • 14
  • 133
  • 155
  • On real CPUs, the store buffer already flushes itself to coherent cache as quickly as it can. Other than maybe sometimes stopping compile-time reordering until after something slow, `seq_cst` doesn't make your stores visible more quickly. It just makes the current thread stall until it's visible. But yes, in a hypothetical CPU that works nothing like the usual model that C++ was designed to be efficient on, this is possible. – Peter Cordes Nov 07 '19 at 11:00
  • There's also (in current ISO C++) another non-normative "should" in [intro.progress](https://eel.is/c++draft/intro.multithread#intro.progress-18.sentence-1): *An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a **finite** period of time.* Beyond that everything is down to quality-of-implementation, which involves how HW works. – Peter Cordes Mar 30 '22 at 19:22
  • 1
    See [Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees?](https://stackoverflow.com/q/61591287) - `seq_cst` or memory barriers on real HW doesn't make data visible to other cores sooner in any significant way. – Peter Cordes Mar 30 '22 at 19:23
3

It appears this answer also answers my question. Well, hopefully my question will help googlers better find it.

Thread 1 "SHOULD" see the updated cancel_work in a "reasonable amount of time", however what exactly is reasonable is (apparently) not specified.

Community
  • 1
  • 1
JanKanis
  • 6,346
  • 5
  • 38
  • 42
2

Calling a function [that isn't inlined by the compiler] will automatically reload any registers that hold variables that aren't immediately local. So as long as the processor running thread1_func() has got it's cache-content flushed or updated based on the store, it will work.

memory_order_relax should ensure that the data is (at some point in the future) flushed from any other processors caches [this is automatic in x86, but not all types of processors, for example certain ARM processors require 'code-driven flushing'], but it is not guaranteed to happen BEFORE any other writes [to regular or atomic variables].

And note that memory order ONLY affects the current thread/processor. What another thread or processor does during the time of a store or load is entirely up to that thread/processor. What I mean by this is that the thread1_func() in your case may be able to read the value 0 for some small amount of time after the value 1 has been written by the other processor/thread. All the atomic operations guarantee is that it EITHER gets the OLD value or the NEW value, never something in between [unless you use memory_order_relax, which doesn't enforce any ordering of loads/stores between operations within the thread. However, whatever memory order you are using, atomic should guarantee [assuming correct implementation] that the value is eventually updated. Just harder to tell when in a relaxed case.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227