relaxed
is fine if you just need to determine the winner of the race to set the flag1, so one thread can start on the work and later threads can just continue on.
If the run_once work produces data that other threads need to be able to read, you'll need a release
store after that, to let potential readers know that the work is finished, not just started. If it was instead just something like printing or writing to a file, and other threads don't care when that finishes, then yeah you have no ordering requirements between threads beyond the modification order of done
which exists even with relaxed. An atomic RMW like test_and_set
lets you determines which thread's modification was first.
BTW, you should check read-only before even trying to test-and-set; unless run()
is only called very infrequently, like once per thread startup. For something like a static int foo = non_constant;
local var, compilers use a guard variable that's loaded (with an acquire load) to see if init is already complete. If it's not, branch to code that uses an atomic RMW to modify the guard variable, with one thread winning, the rest effectively waiting on a mutex for that thread to init.
You might want something like that if you have data that all threads should read. Or just use a static int foo = something_to_run_once()
, or some type other than int
, if you actually have some data to init.
Or perhaps use C++11 std::call_once
to solve this problem for you.
On normal systems, atomic_flag
has no advantage over and atomic_bool
. done.exchange(true)
on a bool is equivalent to test_and_set
of a flag. But atomic_bool
is more flexible in terms of the operations it supports, like plain read that isn't part of an RMW test-and-set.
C++20 does add a test()
method for atomic_flag
. ISO C++ guarantees that atomic_flag
is lock-free, but in practice so is std::atomic<bool>
on all real-world systems.
Footnote 1: why relaxed
guarantees a single winner
The memory_order
parameter only governs ordering wrt. operations on other variables by the same thread.
Does calling test_and_set by a thread force somehow synchronization of the flag with values written by other threads?
It's not a pure write, it's an atomic read-modify-write, so the result of the one that went first is guaranteed to be visible to the one that happens to be second. That's the whole point of test-and-set as a primitive building block for mutual exclusion.
If two TAS operations could both load the original value (false), and then both store true, they would be atomic. They'd have overlapped with each other.
Two atomic RMWs on the same atomic object must happen in some order, the modification-order of that object. (Because they're not read-only: an RMW includes a modification. But also includes a read so you can see what the value was immediately before the new value; that read is tied to the modification order, unlike a plain read).
Every atomic object separately has a modification-order that all threads can agree on; this is guaranteed by ISO C++. (With orders less than seq_cst
, ordering between objects can be different from source order, and not guaranteed that all threads even agree which store happened first, the IRIW problem.)
Being an atomic RMW guarantees that exactly one test_and_set
will return false in thread A or B. Same for fetch_add with multiple threads incrementing a counter: the increments have to happen in some order (i.e. serialized with each other), and whatever that order is becomes the modification-order of that atomic object.
Atomic RMWs have to work this way to not lose counts. i.e. to actually be atomic.