Why does this cppreference excerpt seem to wrongly suggest that atomics can protect critical sections?

Question

int main() {
    std::vector<int> foo;
    std::atomic<int> bar{0};
    std::mutex mx;
    auto job = [&] {
        int asdf = bar.load();
        // std::lock_guard lg(mx);
        foo.emplace_back(1);
        bar.store(foo.size());
    };
    std::thread t1(job);
    std::thread t2(job);
    t1.join();
    t2.join();
}

This obviously is not guaranteed to work, but works with a mutex. But how can that be explained in terms of the formal definitions of the standard?

Consider this excerpt from cppreference:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire [as is the case with default atomics], all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

Atomic loads and stores (with the default or with the specific acquire and release memory order specified) have the mentioned acquire-release semantics. (So does a mutex's lock and unlock.)

An interpretation of that wording could be that when Thread 2's load operation syncs with the store operation of Thread1, it is guaranteed to observe all (even non-atomic) writes that happened-before the store, such as the vector-modification, making this well-defined. But pretty much everyone would agree that this can lead to a segmentation fault and would surely do so if the job function ran its three lines in a loop.

What standard wording explains the obvious difference in capability between the two tools, given that this wording seems to imply that atomic would synchronize in a way.

I know when to use mutexes and atomics, and I know that the example doesn't work because no synchronization actually happens. My question is how the definition is to be interpreted so it doesn't contradict the way it works in reality.

How do you expect `foo.emplace_back(1);` would work in multiple threads without synchronization? — Slava, Dec 29 '20 at 23:13
Running it instantly results in a segmentation fault which is in line with the mental model of c++ most programmers, including I, have. I admit I have trouble defining what my mental model actually is, even though I have used both atomics and mutexs for a long time with success. I simply never considered any formal definitions of the memory orders before. — JMC, Dec 29 '20 at 23:14
@Slava I don't expect it to work. My problem is that after reading the formal definition of what atomics entail, i.e. acquire-release semantics, it seems to me like it should work according to the letter of the law. — JMC, Dec 29 '20 at 23:16
`bar` itself is free of data races but that doesn't prevent a data race on `foo`. They are unrelated objects. — Blastfurnace, Dec 29 '20 at 23:16
@JMC: You may find this interesting/educational : https://www.youtube.com/watch?v=ZQFzMfHIxng — engf-010, Dec 30 '20 at 00:05

Nate Eldredge · Accepted Answer · 2021-10-03T21:04:09.377

2

The quoted passage means that when B loads the value that A stored, then by observing that the store happened, it can also be assured that everything that B did before the store has also happened and is visible.

But this doesn't tell you anything if the store has not in fact happened yet!

The actual C++ standard says this more explicitly. (Always remember that cppreference, while a valuable resource which often quotes from or paraphrases the standard, is not the standard itself and is not authoritative.) From N4861, the final C++20 draft, we have in atomics.order p2:

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

I would agree that if the load in your thread B returned 1, it could safely conclude that the other thread had finished its store and therefore had exited the critical section, and therefore B could safely use foo. In this case the load in B has synchronized with the store in A, since the value of the load (namely 1) came from the store (which is part of its own release sequence).

But it is entirely possible that both loads return 0, if both threads do their loads before either one does its store. The value 0 didn't come from either store, so the loads don't synchronize with the stores in that case. Your code doesn't even look at the value that was loaded, so both threads may enter the critical section together in that case.

The following code would be a safe, though inefficient, way to use an atomic to protect a critical section. It ensures that A will execute the critical section first, and B will wait until A has finished before proceeding. (Obviously if both threads wait for the other then you have a deadlock.)

int main() {
    std::vector<int> foo;
    std::atomic<int> bar{0};
    std::mutex mx;
    auto jobA = [&] {
        foo.emplace_back(1);
        bar.store(foo.size());
    };
    auto jobB = [&] {
        while (bar.load() == 0) /* spin */ ;
        foo.emplace_back(1);
    };

    std::thread t1(jobA);
    std::thread t2(jobB);
    t1.join();
    t2.join();
}

edited Oct 03 '21 at 21:04

answered Dec 30 '20 at 00:57

Nate Eldredge

48,811
6
54
82

So to remove the contradiction, the interpretation should be that IF (and only if?) the acquire recieves the releases's value, THEN it must also observe all writes before the release. In other words, the guarantee described by the quote is conditional on the fact that the actual hardware and the scheduler have somehow decided that this acquire and this release shall syncronize, but it makes no guarantee that one HAS to synchronize with the other. While for a mutex, the synchonization itself is guaranteed simply by the fact that a lock() must wait for the unlock? – JMC Dec 30 '20 at 01:02
1

I don't think it's that complex. It's simply conditional on the store actually having taken place, from the point of view of thread B. It says that the action of A become observable by B in a certain partial order, but it doesn't say that you can time travel into the future to observe things that A hasn't done yet. That would be absurd. If they haven't happened yet, and B wants to wait until they have, then it has to actually *wait*. – Nate Eldredge Dec 30 '20 at 01:17
@JMC: There certainly is no "only if" there. If in your original code, thread B loads the value 0, it gains no knowledge about what A has or hasn't done. It could be that A has not started the critical section; it could be that A is in it; it could be that A has finished the critical section but has not yet done the store; it could be that A thinks it did the store and has moved on to other work, but the store has not yet become visible to B. – Nate Eldredge Dec 30 '20 at 01:21
1

@JMC: But to your comment, yes, the IF is correct. I don't see it as a contradiction as it stands, merely as a (to me, fairly obvious) implicit assumption. The store/load merely conveys some information from one thread to another about what the storing thread has done, and which of its actions can be assumed to be visible. It's up to the programmer how to use that information. If you want to use it for synchronization, for B to wait until A is done with something, you can do that, but you have to write the code to wait. – Nate Eldredge Dec 30 '20 at 01:27
Thank you for the example with the spinlock. However, I think there might actually be an "only if" there because the release-operation imposes a memory barrier in such a way that operations before the release cannot be reordered after the release, if I am not mistaken. Therefore, if thread B does not observe thread A's release, can it not be assured that it can also definitely not observe what A did before that release? For example, consider this video someone else linked in the comments: https://youtu.be/ZQFzMfHIxng?t=2753 – JMC Dec 30 '20 at 01:28
@JMC: So it's entirely possible to implement a mutex in terms of atomics, but there will necessarily be some sort of loop involved. Loads and stores by themselves don't synchronize in that sense. – Nate Eldredge Dec 30 '20 at 01:28
@JMC: There is such a barrier, but your conclusion is not valid. Even if we suppose that everything happens in strict sequence and becomes globally visible immediately, imagine that A is scheduled out in between its `emplace_back()` and its `store()`. Now B's `load()` returns 0, but it might very well observe some changes to the vector. Nothing got reordered in any way. – Nate Eldredge Dec 30 '20 at 01:33
@JMC: Nothing in the quoted passage or the video contradicts this. Everything is just under the implicit assumption that you actually *check* the value that was loaded, to see if it matches what was to be stored, and you can only draw any conclusions if you see that it does match. I suppose you're right that people don't always say that explicitly, but I think it's to be understood, because the alternative is absurd. – Nate Eldredge Dec 30 '20 at 01:38
That makes sense. Would my conclusion be valid if we somehow knew for sure that A was scheduled out only after the store and B, at a later point in time, still did not observe the new value (for caching reasons, for example)? Could we then be sure that B also cannot observe what A did before the store (that it will not retrieve the fresh vector-data from the memory, but the atomic from the cache, for example?) – JMC Dec 30 '20 at 01:38
No, there could always be an arbitrarily long delay between the visibility of whatever A did before the store, and the visibility of the store itself, regardless of what A goes on to do after that. If you want to be sure that a thread *hasn't* done something, you need to have it acquire a load before doing it. Just as in my example, where A, while in the critical section, can be assured that B's critical section isn't visible, because A hasn't yet done the store that B is waiting to acquire. – Nate Eldredge Dec 30 '20 at 01:43

score 1 · Answer 2 · answered Dec 29 '20 at 23:17

Setting aside the elephant in the room that none of the C++ containers are thread safe without employing locking of some sort (so forget about using emplace_back without implementing locking), and focusing on the question of why atomic objects alone are not sufficient:

You need more than atomic objects. You also need sequencing.

All that an atomic object gives you is that when an object changes state, any other thread will either see its old value or its new value, and it will never see any "partially old/partially new", or "intermediate" value.

But it makes no guarantee whatsoever as to when other execution threads will "see" the atomic object's new value. At some point they (hopefully) will, see the atomic object's instantly flip to its new value. When? Eventually. That's all that you get from atomics.

One execution thread may very well set an atomic object to a new value, but other execution threads will still have the old value cached, in some form or fashion, and will continue to see the atomic object's old value, and won't "see" the atomic object's new value until some intermediate time passes (if ever).

Sequencing are rules that specify when objects' new values are visible in other execution threads. The simplest way to get both atomicity and easy to deal with sequencing, in one fell swoop, is to use mutexes and condition variables which handle all the hard details for you. You can still use atomics and with a careful logic use lock/release fence instructions to implement proper sequencing. But it's very easy to get it wrong, and the worst of it you won't know that it's wrong until your code starts going off the rails due to improper sequencing and it'll be nearly impossible to accurately reproduce the faulty behavior for debugging purposes.

But for nearly all common, routine, garden-variety tasks mutexes and condition variables is the most simplest solution to proper inter-thread sequencing.

I have used mutexes, atomics, cvars etc. successfully for years and my intuition, same as anyone's I expect, agrees with your answer completely. The question is: How can this be justified with the standard and the formal definitions of the memory model? — JMC, Dec 29 '20 at 23:26
Could it be said then that the guarantees for acquire-release semantics are only valid IF the release and acquire ops sync, but that it isn't guaranteed that they sync? In other words, thread 2 sees the results of the critical section IF it reads the resulting atomic value. In that case, where does it say that mutex's don't suffer from the same problem (I understand that they don't, but where does it say that) Couldn't thread 2 simply "never" see the results of thread 1's mutex protected critical section? — JMC, Dec 29 '20 at 23:40
No, mutex unlock guarantees that if thread 2 locks the same mutex (after thread 1 unlocked it) it will see all changes that were made by thread 1 while the mutex was locked. — Sam Varshavchik, Dec 29 '20 at 23:49
And according to the excerpt in my question, an atomic guarantees that if thread 2 loads from the same atomic (after thread 1 stored to it) it will see all changes that were made by thread 1 before the store. "That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory." But it obviously doesn't in practice. How can that be explained? Is cppreference just completely wrong? — JMC, Dec 29 '20 at 23:51
" Is cppreference just completely wrong?" No, your interpretation and expectation is completely wrong. — Slava, Dec 30 '20 at 00:00
My expectation is that of everyone else. I expect a segmentation fault in my example code and that's exactly what I get. I know that atomics don't provide synchronization. — JMC, Dec 30 '20 at 00:04

score 1 · Answer 3 · edited Dec 30 '20 at 00:28

1

The idea is that when Thread 2's load operation syncs with the store operation of Thread1, it is guaranteed to observe all (even non-atomic) writes that happened-before the store, such as the vector-modification

Yes all writes that done by foo.emplace_back(1); would be guaranteed when bar.store(foo.size()); is executed. But who guaranteed you that foo.emplace_back(1); from thread 1 would see any/all non partial consistent state from foo.emplace_back(1); executed in thread 2 and vice versa? They both read and modify internal state of std::vector and there is no memory barrier before code reaches atomic store. And even if all variables would be read/modified atomically std::vector state consists of multiple variables - size, capacity, pointer to the data at least. Changes to all of them must be synchronized as well and memory barrier is not enough for that.

To explain little more let's create simplified example:

int a = 0;
int b = 0;
std::atomic<int> at;

// thread 1 
int foo = at.load();
a = 1;
b = 2;
at.store(foo);

// thread 2
int foo = at.load();
int tmp1 = a;
int tmp2 = b;
at.store(tmp2);

Now you have 2 problems:

There is no guarantee that when tmp2 value is 2 tmp1 value would be 1 as you read a and b before atomic operation.
There is no guarantee that when at.store(b) is executed that either a == b == 0 or a == 1 and b == 2, it could be a == 1 but still b == 0.

Is that clear?

But:

// thread 1 
mutex.lock();
a = 1;
b = 2;
mutex.unlock();

// thread 2
mutex.lock();
int tmp1 = a;
int tmp2 = b;
mutex.unlock();

You either get tmp1 == 0 and tmp2 == 0 or tmp1 == 1 and tmp2 == 2, do you see the difference?

edited Dec 30 '20 at 00:28

Alan Birtles

32,622
4
31
60

answered Dec 29 '20 at 23:26

Slava

43,454
1
47
90

Well, the idea is that thread 1's acquire then syncs with the release of thread 2 and vice versa. Obviously that doesn't happen, but does that mean cppreference's definition is under-specified or wrong? – JMC Dec 29 '20 at 23:43
I do not understand where you got that idea but it is totally wrong. `std::atomic` memory order guarantees what happens at the moment of operation it does not guarantee what will happen before. Imagine thread 1 executes `foo.emplace_back(1);` and stops in the middle. Then thread 2 executes `foo.emplace_back(1);`. Neither atomic operation happened yet but you get race condition. It would be similar as if you lock mutex after ``foo.emplace_back(1);` – Slava Dec 29 '20 at 23:53
I get it from the excerpt in my question from cppreference. I know that what you say is true. But how should the excerpt then be interpreted to fit the reality of how everyone uses atomics? – JMC Dec 29 '20 at 23:54
Not everyone uses atomic this way. I do not quite understand where your expectation comes from and how you interpret what you read that way. Memory order affects what happens after atomic operations not before, how is that not clear? – Slava Dec 29 '20 at 23:57
Everyone uses it in a way like you say, like they DONT synchronize because everybody knows that they DONT synchronize. Please read the cppreference excerpt in my question. My actual question is: Is cppreference just wrong or if not, how is that wording to be interpreted? – JMC Dec 30 '20 at 00:01
Let me give you an aexample, maybe you would understand. – Slava Dec 30 '20 at 00:02
Thank you for providing an example, but I don't see how it is relevant to my question. There is no acquire-operation (in the sense of std::memory_order_acquire) in your whole example, "at" is never loaded, so it cannot be relevant to a question that is exactly about the guarantess of acquire-release operations. I think it's not clear what I am asking. I know how to use mutex's and atomics. I have written and debugged thread queues, cvar stuff and various parallel algorithms in c++ before. My question is of a language-lawyer nature. – JMC Dec 30 '20 at 00:12
@JMC I do not see why my example is different than your in principle. Can you elaborate? – Slava Dec 30 '20 at 00:15
In my example one thread stores to the atomic, and another thread reads from the atomic. According to cppreference, storing to an atomic is a "release, and reading from an atomic is an "acquire" operation, as defined by c++ memory's model. When an acquire syncs with a release, the "acquiring" thread will see all side effects that happened before the release in the "releasing" thread. I am trying to understand what they are trying to say with that because the way I interpret it, it would give atomics the same power as mutexes, which they don't have. – JMC Dec 30 '20 at 00:18
Ok first of all do you understand that "That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory." is not equal to "thread A is forced to finish all write operations it plans before its store"? Ie in my example thread1 is not forced to write both `a` and `b` as atomic operation. It can write `a` but not `b`. Is that clear? – Slava Dec 30 '20 at 00:31
that is clear, but it also says "all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B." So all (even non-atomic) writes that happened-before the store (this includes the vector's emplace_back) become visible in thread B. – JMC Dec 30 '20 at 00:33
Yes "that happened before", what if they did not happen? What if `b = 2` haven't happened yet when thread2 reaches atomic but `a = 1` did? Do you expect thread1 accumulate all changes and commit them atomically? No that wont happen and documentation did not say that. – Slava Dec 30 '20 at 00:34
it says happens-before the STORE in thread 1, from the POV of thread 1, which is always true. The emplace_back() in t1 happens-before the store() in t1. happen-before with hyphen is a specifically defined term of the standard. Of course in practice t2 will not wait, but then what is the purpose of the wording on cppreference? – JMC Dec 30 '20 at 00:37
Yes what happened before `store()` will be visible at `load()`, but who said you will not see anything when store did not happen yet? Why do you expect `int tmp2 = b;` will only read new value in thread 2 if and only if `at.store(foo);` happened in thread1? There is nothing that says that in standard. – Slava Dec 30 '20 at 00:42
That should probably not be assumed. But then what is the meaning of the excerpt? – JMC Dec 30 '20 at 00:45
That means you cannot expect changes to both `a` and `b` to be atomic and that creates race condition in `std::vector`. But even worse you cannot even expect your reading from `a` and `b` would be proper as you need to implement logic that based on value you read from atomic you can access `a` and `b`. – Slava Dec 30 '20 at 00:49
I know that they're not atomic, but I was asking about the cppreference excerpt's wording, which does not say that. I think we are completely missing each other's point. Pretend that my original question is: how did cppreference arrive at this wording of their note? What in the standard justifies it? – JMC Dec 30 '20 at 00:50
2

I suppose what cppreference forgot to mention is that a release to an atomic synchronizes-with an acquire [__if that acquire reads the value written__](https://eel.is/c++draft/atomics.order#2). You *can* get synchronization from atomics, you just have to actually *look* at what's in them to make it work (e.g. see cppreference's `atomic_flag` spinlock). – HTNW Dec 30 '20 at 00:54
"which does not say that" they cannot explicitly say what will not happen, they can only describe what would happen. You on another side derived some conclusions that are not based on what said there. They only say what happens when thread A executes `load()` after B executed `store()` they did not say anything what will happen before that. – Slava Dec 30 '20 at 00:54
@HTNW Thx, that link is very good and seems to actually define the "synchronizes with" relationship in a proper way. I somehow missed that when I searched for it in the standard. – JMC Dec 30 '20 at 01:20
1

@Slava I know, but they word it as "They synchronize and do X" when it should say "If it synchronizes, THEN it will do X." I know this seems like pedantry, but it is a language-lawyer type question. – JMC Dec 30 '20 at 01:22

Why does this cppreference excerpt seem to wrongly suggest that atomics can protect critical sections?

3 Answers3

Linked