gcc optimisation effect on loops with apparently constant variable

Question

I was optimizing a c++ code in which I encountered a situation that can be simplified as follows.

Consider this code:

#include <iostream>
#include <thread>

using namespace std;

bool hit = false;

void F()
{
   this_thread::sleep_for(chrono::seconds(1));
   hit = true;
}

int main()
{
   thread t(F);

   while (!hit)
      ;

   cout << "finished" << endl;
   t.join();
   return 0;
}

This basically starts a thread which after a second will change the value of hit to true. At the same time the code enters an empty loop which will continue until the hit's value will become true. I compiled this with gcc-5.4 using -g flag and everything was fine. The code will output finished and ends. But then I compiled it with -O2 flag and this time the code got stuck in the loop infinitely.

Looking at the disassembly, the compiler had generated the following, which is the root cause of the infinite loop:

jmp 0x6ba6f3 ! 0x00000000006ba6f3

OK, so clearly, the compiler has deduced that hit's value is false and it will not change in the loop so why not assume that it is an infinite loop without considering that another thread may change its value! And this optimization mode is added in the higher level (-O2). Since I'm not exactly an optimization flag expert, can anyone tell me which of them is responsible for this result so I can turn it off? And would turning it off have any major performance cost for other pieces of code? I mean, how much this pattern of code is rare?

@Milack27 yes it solves the problem! Man, there is many things in c++ that I still don't know about! — Sinapse, Oct 27 '17 at 18:09
@Sinapse: No, actually, it doesn't. It only made the symptoms go away - _for now_. Trouble is, it's still Undefined Behavior, and if you change one line of code the optimizer may make some new decisions. — MSalters, Oct 27 '17 at 19:01
@MSalters I understand the problem generally, what volatile solves is the compiler's optimization alone! But since bool (1 byte) would not face tearing in read/write and `cache coherence` does not matter in the example above (it will only cause to loop some more than it actually needs), we are good to go JUST IN THIS case of course — Sinapse, Oct 27 '17 at 19:16
@Sinapse: The Standard is very clear on this point. It's not "tearing" or "cache coherence", the Standard doesn't talk about that. Two accesses to a single variable like this **must** be ordered, on penalty of Undefined Behavior. `volatile` does not affect ordering and does not cure the Undefined Behavior. As far as the Standard is concerned, each thread can get _its own `volatile bool hit`_ ! — MSalters, Oct 27 '17 at 20:23

score 6 · Accepted Answer · answered Oct 27 '17 at 17:59

6

This code has Undefined Behavior. You're modifying hit from one thread and reading it form another, without synchronization.

Optimizing hit to false is a valid outcome of Undefined Behavior. You can solve this by making hit a std::atomic<bool>. This makes if well-defined, and blocks the optimization.

answered Oct 27 '17 at 17:59

MSalters

173,980
10
155
350

Yes this too will solve the problem (I think volatile will do it more efficiently for my case) but OMG!! Just how the compiler knows it when you define the variable as atomic?! I mean volatile make sense since you are adding a qualifier but in case of std::atomic, how compiler will deduce that it may change in another thread?? – Sinapse Oct 27 '17 at 18:14
Guess what! There are volatile members inside the `std::atomic` class. :) Try opening the "atomic" file from g++ include folder, you can see them there. – Milack27 Oct 27 '17 at 18:28
Nice, that was worthy! – Sinapse Oct 27 '17 at 18:38
@Milack27: Implementation detail. – MSalters Oct 27 '17 at 19:00

score 2 · Answer 2 · answered Oct 27 '17 at 18:01

If you want to read/write hit from several threads at the same time then you need some kind of synchronization otherwise you'll introduce a race condition. You can either make hit an std::atomic<bool> or add a mutex that needs to be locked when accessing hit value. If you just want to wait for thread to finish its job than you can leave just thread.join() (and print "finished" after it) without introducing any additional flags.

score 1 · Answer 3 · answered Oct 27 '17 at 18:16

1

By declaring hit as volatile, you're telling the compiler that this variable can be modified by external factors at any time, so the compiler won't assume that its value won't change along your main function.

As long as there is only one thread writing to the hit variable, your code should work properly, with no race conditions involved. However, when you're dealing with multiple threads, it's always safer to use synchronization tools, like atomic objects, mutexes and semaphores, as already mentioned in the other answers here.

answered Oct 27 '17 at 18:16

Milack27

1,619
2
20
31

`volatile` [fails to provide the necessary ordering](https://stackoverflow.com/a/4558031/15416). – MSalters Oct 27 '17 at 19:06
Yes. As I said, _it's always safer to use synchronization tools_. Using `volatile` should work properly _in this case_. – Milack27 Oct 27 '17 at 19:24

gcc optimisation effect on loops with apparently constant variable

3 Answers3