Do we have the guarantee that any atomic write will store the new value of the atomic variable in the main memory immediately?

Question

So, I was reading a lot about instruction and memory reordering and how we can prevent it, but i still have no answer to one qustion (probably because I'm not attentive enough). My question is: Do we have the guarantee that any atomic write will store the new value of the atomic variable in the main memory immediately? Let's take a look at a small example:

std::atomic<bool> x;
std::atomic<bool> y;
std::atomic<int> count;
void WritingValues()
{
   x.store(true, std::memory_order_relaxed);
   y.store(true, std::memory_order_relaxed);
}
void ReadValues()
{
   while( !y.load(std::memory_order_relaxed) );
   if( x.load(std::memory_order_relaxed) )
       ++count;
}
int main()
{
   x = false;
   y = false;
   count = 0;
   std::thread tA(WritingValues);
   std::thread tB(ReadValues);
   tA.join();
   tB.join();
   assert( count.load() != 0 );
}

So, here our assert can definitely fire, as we use std::memory_order_relaxed and do not prevent any instruction reordering (or memory reordering at compile time, i suppose that is the same thing). But if we place some compiler barrier in WritingValues to prevent instruction reordering, will everything be OK? I mean, does x.store(true, std::memory_order_relaxed) guarantees, that the write of that particular atomic variable will be directly into the memory, without any latency? Or does x.load(std::memory_order_relaxed) guarantess, that the value would be readed from the memory, not the cache with invalid value? In other words, this store guarantees only atomicity of the operation and have the same memory behaviour as usual non-atomic variable, or it also has influence on memory behaviour?

Your question is actually meaningless, and cannot be answered. The correct concept, when it comes to C++, is "sequencing". In the same execution thread, modifying one atomic variable after another one sequences the first after the second. But none of this will determine whether a different execution thread observes changes to any of these atomic variables. To determine that, it is necessary to determine how these execution threads sequence with each other. And there are a bunch of rules for that too. — Sam Varshavchik, Aug 09 '19 at 14:19
I'm not completely sure I understand you clearly, @SamVarshavchik . What do you mean by " modifying one atomic variable after another one sequences the first after the second ". But what about instruction reordering? It can easily happens in that particular case, as I know. Also I am a bit confused about these *bunch of rules*. One of the possible variants to do that in a right way is to use memory barriers, that's what i am talking about here. I just want to know is there a possibility that atomic write will not write to memory immediately. — IgnatiusPo, Aug 09 '19 at 14:30
All that an atomic write guarantees is that no other thread will see the object's partial contents. Other threads see either the value before or after it was changed. For `bool`s this is mostly meaningless. As far as whether other threads see the write immediately, this will forever be a mystery, because there's nothing that a thread can do to determine whether another thread has written something. That would be sequencing. I.E.: after locking a mutex a thread will see everything that another thread did before unlocking the same mutex because this is sequenced such. — Sam Varshavchik, Aug 09 '19 at 16:03
@SamVarshavchik But what about MESI / MOESI protocol? I thought that was developed specially for such cases with cache coherency issues. So thread knows, if the memory was modified, as it has a special **modificator**. Anyway, I guess we are talking about different things, or I just misunderstood you. — IgnatiusPo, Aug 15 '19 at 12:23
Your question seems to mix CPU and C/C++. You can't do that directly: the C/C++ semantics isn't specified in term of low level asm and CPU guarantees (it isn't specified at all, but that's another topic). You need to insert an ABI frontier somewhere to mix high and low level. — curiousguy, Aug 15 '19 at 22:36
@curiousguy Yes, I understand. But that is what memory barriers there for, aren't they? I mean, on different platforms these memory barriers works in different ways. ( e.g. on x86 seqential consistency order behaves the same as acq_rel order ) So these barriers makes compiler guarantees us some cases. I don't know what exactly compilers use,what asm instructions, but that is the thing - if I use memory barriers in a right way I can be sure memory is visible where I want, so that's kinda mix of CPU and C++, isn't it? I just want to clarify it, correct me if I'm wrong — IgnatiusPo, Aug 16 '19 at 07:57
@IgnatiusPo In practice it may be so, but at the abstract level C/C++ is not defined at all in term of CPU stuff. The compiler only need to have a correspondance with the low level CPU stuff 1) for volatile access 2) at ABI frontiers, when you call another function according to an ABI. What do you mean by "CS behaves as ack-rel"? — curiousguy, Aug 16 '19 at 15:24
@curiousguy I meant that a lot of stuff with memory on x86 CPU architecture is being done with the most stricted rules, and as an example - memory ordering. x86 is known as very strict guy, so by default it would do everything in the most stricted way, using the most stricted asm instructions and so on. There is a difference between seq_cst and acq_rel order ( [reference](https://en.cppreference.com/w/cpp/atomic/memory_order) ), but not for x86. I understand that I mix low-level CPU stuff with C++, but eventually all these memory barriers make compiler calls some specific asm instructions. — IgnatiusPo, Aug 16 '19 at 15:57
@curiousguy But I've got your point, thanks. I'm a newer here and not sure I express my thoughts clearly :) — IgnatiusPo, Aug 16 '19 at 15:58
@IgnatiusPo Which "barrier" is needed on x86 to get acq_rel order? — curiousguy, Aug 16 '19 at 16:01
@curiousguy When you want to use acq_rel memory ordering you use std::memory_order_acq_rel **barrier** ( or std::memory_order_acquire on reading and std::memory_order_release on writing the same atomic variable ) , but on x86 you will get the same result, as if you use std::memory_order_seq_cst **barrier**. — IgnatiusPo, Aug 16 '19 at 16:10
@IgnatiusPo What asm is produced by the memory_order_acq_rel barrier? — curiousguy, Aug 16 '19 at 17:26
@curiousguy I would say you if I knew. I am completely sure it's platform specific and I don't know what exactly they use. I've heard about x86 is very strict from [here](https://www.youtube.com/watch?v=ZQFzMfHIxng) and [here](https://www.youtube.com/watch?v=tk5P7mt2fAw) — IgnatiusPo, Aug 19 '19 at 05:31
The mention of "main memory" is problematic as data doesn't have to be pushed to RAM at any time of caches are sufficiently large; you seem to make assumption on the operation of the memory system. Data in one cache can be copied to another cache or main RAM when needed; this is outside the control of the program. (Unless you invalidate the cache which is seldom necessary.) — curiousguy, Aug 22 '19 at 17:22

alexrider · Accepted Answer · 2019-08-09T14:46:06.870

 I mean, does x.store(true, std::memory_order_relaxed) guarantees, that the  
 of that particular atomic variable will be directly into the memory,  
 without any latency?

No it doesn't and in fact given bool and memory order relaxed there is no 'invalid' value if you read it only once, both true and false are ok.
Since relaxed memory order explicitly stays that no ordering is performed. Basically in your case it only means that after flipping from false to true, at some point it will become true for all other processes, but doesn't state 'whent' it will happen. So the only thing you can be sure here is that it won't become false again after becoming true. But there is no constraints on how long it will be false in another thread.
Also it guarantees that you won't see any partially written variable in another thread, but that's hardly the case for the bools.
You need to use aquire and release here. And even that won't give any guarantees about actual memory itself, only about program behavior, cache synchronization may do the trick even without bouncing data back and froth to the memory.

Yes that's what memory order release and acquire are there for. Writing with release releases it to anyone reading it with acquire. — alexrider, Aug 09 '19 at 14:43
Release only "releases" the other, previous memory operations. The one that is done by the release operation is not affected. — curiousguy, Jan 20 '20 at 22:13

score 0 · Answer 2 · answered Aug 09 '19 at 14:48

Since all of the load and store instructions are atomic they each a single machine instruction so the two threads never "interrupt each other" in the "middle" of a load or store instruction.

The title of your question is "Do we have the guarantee that any atomic write will store the new value of the atomic variable in the main memory immediately?". But the very definition of an atomic instruction is that it cannot be interrupted by a context switch, hardware interrupt, software expection -- nothing!

std::memory_order_relaxed allows for some reordering of instructions in a single function. See for example this question. It is almost the same as your question but you have memory_order_relaxed in ReadValues() instead of memory_order_acquire. In this function it is possible that the spinlock on variable y is placed after the counter increment due to the relaxed condition (compiler reordering etc.). In any case the ASSERT may fail because y may be set to true before x is in WriteValues() due to the memory reordering allowed by memory_order_relaxed (referencing the answer in the similar question.

Yep, i wrote the same in the question :D. I know, that assert can fire, and I noticed it above. Also i'm not asking about interrupting, it's understandable. I wasn't sure about cache coherency and memory behaviour here, I thought that atomic variable prevents memory reordering problem with any memory order specified, but I was wrong. Now i know the answer, thanks :) — IgnatiusPo, Aug 09 '19 at 14:53

Do we have the guarantee that any atomic write will store the new value of the atomic variable in the main memory immediately?

2 Answers2