2

I just read Do not use volatile as a synchronization primitive article on CERT site and noticed that a compiler can theoretically optimize the following code in the way that it'll store a flag variable in the registers instead of modifying actual memory shared between different threads:

bool flag = false;//Not declaring as {{volatile}} is wrong. But even by declaring {{volatile}} this code is still erroneous
void test() {
  while (!flag) {
    Sleep(1000); // sleeps for 1000 milliseconds
  }
}
void Wakeup() {
  flag = true;
}
void debit(int amount){
   test();
   account_balance -= amount;//We think it is safe to go inside the critical section
}

Am I right?

Is it true that I need to use volatile keyword for every object in my program that shares its memory between different threads? Not because it does some kind of synchronization for me (I need to use mutexes or any other synchronization primitives to accomplish such task anyway) but just because of the fact that a compiler can possibly optimize my code and store all shared variables in the registers so other threads will never get updated values?

Tim B
  • 40,716
  • 16
  • 83
  • 128
FrozenHeart
  • 19,844
  • 33
  • 126
  • 242
  • 1
    Volatile is rarely the right solution and it certainly isn't if multi threading is involved - the whole linked article tries to hammer that point home. To make it simple: Are you writing a device driver? An operating system? No? Don't use volatile. – Voo Nov 11 '15 at 16:19

3 Answers3

6

It's not just about storing them in the registers, there are all sorts of levels of caching between the shared main memory and the CPU. Much of that caching is per CPU-core so any change made there will not be seen by other cores for a long time (or potentially if other cores are modifying the same memory then those changes may be lost completely).

There are no guarantees about how that caching will behave and even if something is true for current processors it may well not be true for older processors or for the next generation of processors. In order to write safe multi threading code you need to do it properly. The easiest way is to use the libraries and tools provided in order to do so. Trying to do it yourself using low level primitives like volatile is a very hard thing involving a lot of in-depth knowledge.

Tim B
  • 40,716
  • 16
  • 83
  • 128
  • So, I was right? Should i specify volatile keyword for every object that shares its memory between different threads then? – FrozenHeart Nov 11 '15 at 13:26
  • 2
    No, you should use the proper threading structures for your language and architecture. Throwing volatile everywhere is not the solution. – Tim B Nov 11 '15 at 13:27
  • What do you mean by "proper threading structures"? Suppose that I start several threads and synchronize all accesses to the shared variables via some kind of synchronization primitives like mutexes. Am I safe after this or should I use volatile anyway? – FrozenHeart Nov 11 '15 at 13:30
  • @Tim B: You not exactly correct. Changes in any global memory location will be visible to other threads eventually. But, depending on memory model, there are no guarantee in what order. – Michael Nastenko Nov 11 '15 at 13:30
  • 1
    @FrozenHeart please do follow Tim's advice. Multithreading is a very complicated issue. Use data structures like single reader single writer queues etc. Model your problem in such a way that inter-thread interaction goes through specialized data structures, that can handle concurrency by design. – enobayram Nov 11 '15 at 13:31
  • @MichaelNastenko Yeah, never was possibly the wrong word. I was referring to the case where multiple cores make changes to the same memory and as a result some are lost. – Tim B Nov 11 '15 at 13:31
  • @Tim B: Oh, I see :-) – Michael Nastenko Nov 11 '15 at 13:32
  • @Michael Nastenko Yep, but the order of statements can't change if we use synchronization primitives (for example, mutexes or critical sections) or atomic operations from C++11 with default policy. Am I wrong? – FrozenHeart Nov 11 '15 at 13:37
  • That's right. But I would recommend you to stick to locking algorithms at first. Atomic operations are too low-level and very tricky. – Michael Nastenko Nov 11 '15 at 23:24
3

It is actually very simple, but confusing at the same time. On a high level, there are two optimization entities at play when you write C++ code - compiler and CPU. And within compiler, there are two major optimization techniue in regards to variable access - omitting variable access even if written in the code and moving other instructions around this particular variable access.

In particular, following example demonstrates those two techniques:

int k; bool flag;

void foo() {
    flag = true;
    int i = k;
    k++;
    k = i;
    flag = false;
}

In the code provided, compiler is free to skip first modification of flag - leaving only final assignment to false; and completely remove any modifications to k. If you make k volatile, you will require compiler to preserve all access to k = it will be incremented, and than original value put back. If you make flag volatile as well, both assignments first to true, than two false will remain in the code. However, reordering would still be possible, and the effective code might look like

void foo() {
    flag = true;
    flag = false;
    int i = k;
    k++;
    k = i;
}

This will have unpleasant effect if another thread would be expecting flag to indicate if k is being modified now.

One of the way to achive the desired effect would be to define both variables as atomic. This would prevent compiler from both optimizations, ensuring code executed will be the same as code written. Note that atomic is, in effect, a volatile+ - it does all the volatile does + more.

Another thing to notice is that compiler optimizations are, indeed, a very powerful and desired tool. One should not impede them just for the fun of it, so atomicity should be used only when it is required.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • In which cases compiler can reorder instructions by its own will? For example, can they be reordered in case when synchronization primitives like mutexes are involved like in the following example -- http://pastie.org/10550471. Can this code print 1 instead of 0 at some conditions? – FrozenHeart Nov 11 '15 at 14:21
  • @FrozenHeart, you just opened a huge discussions. The rules for reording are complicated because of different memory models assume different level of atomicity - but this is when you talk about atomic variables. I can't open pastebin code from here, but if you have mutexes there, it is simple - no code can be reordered in a way which crossess mutex lock or unlock call. – SergeyA Nov 11 '15 at 14:25
  • I started another question about it -- http://stackoverflow.com/questions/33652844/in-which-cases-compiler-can-reorder-instructions-by-its-own-will – FrozenHeart Nov 11 '15 at 14:38
1

On your particular

bool flag = false;

example, declaring it as volatile will universally work and is 100% correct. But it will not buy you that all the time.

Volatile IMPOSES on the compiler that each and every evaluation of an object (or mere C variable) is either done directly on the memory/register or preceded by retrieval from external-memory medium into internal memory/registers. In some cases code and memory-footprint size can be quite larger, but the real issue is that it's not enough.

When some time-based context-switching is going on (e.g. threads), and your volatile object/variable is aligned and fits in a CPU register, you get what you intended. Under these strict conditions, a change or evaluation is atomically done, so in a context switching scenario the other thread will be immediately "aware" of any changes.

However, if your object/ big variable does not fit in a CPU register (from size or no alignment) a thread context-switch on a volatile may still be a NO-NO... an evaluation at the concurrent thread may catch a mid-changing procedure... e.g. while changing a 5-member struct copy, the concurrent thread is invoked amid 3rd member changing. cabum!

The conclusion is (back to "Operating-Systems 101"), you need to identify your shared objects, elect preemptive+blocking or non-preemptive or other concurrent-resource access strategy, and make your evaluaters/changers atomic. The access methods (change/eval) usually incorporate the make-atomic strategy, or (if it's aligned and small) simply declare it as volatile.

jpinto3912
  • 1,457
  • 2
  • 12
  • 19