32

In C++11 standard the machine model changed from a single thread machine to a multi threaded machine.

Does this mean that the typical static int x; void func() { x = 0; while (x == 0) {} } example of optimized out read will no longer happen in C++11?

EDIT: for those who don't know this example (I'm seriously astonished), please read this: https://en.wikipedia.org/wiki/Volatile_variable

EDIT2: OK, I was really expecting that everyone who knew what volatile is has seen this example.

If you use the code in the example the variable read in the cycle will be optimized out, making the cycle endless.

The solution of course is to use volatile which will force the compiler to read the variable on each access.

My question is if this is a deprecated problem in C++11, since the machine model is multi-threaded, therefore the compiler should consider concurrent access to variable to be present in the system.

ildjarn
  • 62,044
  • 9
  • 127
  • 211
Šimon Tóth
  • 35,456
  • 20
  • 106
  • 151

2 Answers2

94

Whether it is optimized out depends entirely on compilers and what they choose to optimize away. The C++98/03 memory model does not recognize the possibility that x could change between the setting of it and the retrieval of the value.

The C++11 memory model does recognize that x could be changed. However, it doesn't care. Non-atomic access to variables (ie: not using std::atomics or proper mutexes) yields undefined behavior. So it's perfectly fine for a C++11 compiler to assume that x never changes between the write and reads, since undefined behavior can mean, "the function never sees x change ever."

Now, let's look at what C++11 says about volatile int x;. If you put that in there, and you have some other thread mess with x, you still have undefined behavior. Volatile does not affect threading behavior. C++11's memory model does not define reads or writes from/to x to be atomic, nor does it require the memory barriers needed for non-atomic reads/writes to be properly ordered. volatile has nothing to do with it one way or the other.

Oh, your code might work. But C++11 doesn't guarantee it.

What volatile tells the compiler is that it can't optimize memory reads from that variable. However, CPU cores have different caches, and most memory writes do not immediately go out to main memory. They get stored in that core's local cache, and may be written... eventually.

CPUs have ways to force cache lines out into memory and to synchronize memory access among different cores. These memory barriers allow two threads to communicate effectively. Merely reading from memory in one core that was written in another core isn't enough; the core that wrote the memory needs to issue a barrier, and the core that's reading it needs to have had that barrier complete before reading it to actually get the data.

volatile guarantees none of this. Volatile works with "hardware, mapped memory and stuff" because the hardware that writes that memory makes sure that the cache issue is taken care of. If CPU cores issued a memory barrier after every write, you can basically kiss any hope of performance goodbye. So C++11 has specific language saying when constructs are required to issue a barrier.

volatile is about memory access (when to read); threading is about memory integrity (what is actually stored there).

The C++11 memory model is specific about what operations will cause writes in one thread to become visible in another. It's about memory integrity, which is not something volatile handles. And memory integrity generally requires both threads to do something.

For example, if thread A locks a mutex, does a write, and then unlocks it, the C++11 memory model only requires that write to become visible to thread B if thread B later locks it. Until it actually acquires that particular lock, it's undefined what value is there. This stuff is laid out in great detail in section 1.10 of the standard.

Let's look at the code you cite, with respect to the standard. Section 1.10, p8 speaks of the ability of certain library calls to cause a thread to "synchronize with" another thread. Most of the other paragraphs explain how synchronization (and other things) build an order of operations between threads. Of course, your code doesn't invoke any of this. There is no synchronization point, no dependency ordering, nothing.

Without such protection, without some form of synchronization or ordering, 1.10 p21 comes in:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

Your program contains two conflicting actions (reading from x and writing to x). Neither is atomic, and neither is ordered by synchronization to happen before the other.

Thus, you have achieved undefined behavior.

So the only case where you get guaranteed multithreaded behavior by the C++11 memory model is if you use a proper mutex or std::atomic<int> x with the proper atomic load/store calls.

Oh, and you don't need to make x volatile too. Anytime you call a (non-inline) function, that function or something it calls could modify a global variable. So it cannot optimize away the read of x in the while loop. And every C++11 mechanism to synchronize requires calling a function. That just so happens to invoke a memory barrier.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Hmm. The thing is that the code doesn't rely on ordering or atomicity. That's a different thing. Could you maybe post the relevant section of the standard? – Šimon Tóth Oct 14 '12 at 01:22
  • 3
    @Let_Me_Be: It does rely on ordering and atomicity. The change from the other thread must become *visible* to this one. Which means it must become ordered *before* some operation in this thread. Since `volatile` doesn't deal with ordering, there is no guarantee that it will become visible. – Nicol Bolas Oct 14 '12 at 01:42
  • OH wow. I'm getting more and more confused :-D If the problem is with ordering than it stands even with atomic variables does it not? I mean nothing guarantees you that the second thread will actually be executed (well, the scheduler in the operating system probably does, but nothing in C++). I mean, where does the undefined behavior come from in the following code: https://gist.github.com/3886984 (if I use the non-atomic (non-)volatile version)? – Šimon Tóth Oct 14 '12 at 01:57
  • Plus, doesn't this break the original meaning of volatile for non-threaded programs that deal with hardware, mapped memory and stuff? – Šimon Tóth Oct 14 '12 at 01:59
  • 9
    Ah, finally an answer that mentions visibility issues. Almost nobody believes me when I tell them "Some thread writes a new value to some volatile variable. But in C++, `volatile` alone gives no guarantees whatsoever that any other thread will ever see the updated value." – fredoverflow Feb 02 '13 at 12:11
  • Your last paragraph troubles me. You talk about calling non-inline functions and how it cannot optimize it. But this is not true at all. Any compiler doing LTO (link time optimization) can analyze and potentially inline *any function at all*. And if it looks into the function and determines that no memory access happens then it can optimize out anything it likes. – Zan Lynx Sep 02 '13 at 01:56
  • @ZanLynx: OK, let me put it another way. The C++ specification requires that memory coherency and integrity actually *works*. Therefore, any compiler that optimizes out a variable read after a memory barrier/mutex/etc is *broken*. – Nicol Bolas Sep 02 '13 at 02:12
  • @NicolBolas: Yes but it has nothing to do with function calls. A function call, even to a function in another translation unit, is not a reliable memory barrier. – Zan Lynx Sep 02 '13 at 02:28
  • 1
    @ZanLynx: My point is that the optimizer either sees: 1) A call to a function that it can't inline through, and therefore cannot assume that the variable won't be changed, or 2) an explicit memory barrier (perhaps imported from inline functions), and therefore it cannot assume that the variable won't be changed. In both cases, the optimizer cannot assume that the variable hasn't been changed. Thus, if your code is doing its memory barrier stuff, *directly or indirectly*, you don't need `volatile`. And if it isn't, `volatile` isn't helping you. – Nicol Bolas Sep 02 '13 at 02:55
2

Intel developer zone mentions, "Volatile: Almost Useless for Multi-Threaded Programming"

The volatile keyword is used in this signal handler example from cppreference.com

#include <csignal>
#include <iostream>

namespace
{
  volatile std::sig_atomic_t gSignalStatus;
}

void signal_handler(int signal)
{
  gSignalStatus = signal;
}

int main()
{
  // Install a signal handler
  std::signal(SIGINT, signal_handler);

  std::cout << "SignalValue: " << gSignalStatus << '\n';
  std::cout << "Sending signal " << SIGINT << '\n';
  std::raise(SIGINT);
  std::cout << "SignalValue: " << gSignalStatus << '\n';
}
JayS
  • 2,057
  • 24
  • 16
  • 2
    That "Wikipedia quote" is wrong (it references MSDN, so it should say "According to MSDN"). The standard actually says "for some implementations, volatile *might* indicate that special hardware instructions are required to access the object." In any case, the standard never says that it is 'only' for "hardware access" (whatever that means). – Brandin Jan 18 '19 at 10:08