Assuming that the Lock
class is written properly, such that its acquire/release functions contain memory barriers of the corresponding types, then no, the second check cannot be optimized away.
Such optimizations are covered under the C++ standards "as-if rule" [C++20 N4860 intro.abstract p1]. Abstractly, the code must execute exactly as written. But the compiler can apply transformations if it can prove that, assuming the program is otherwise free of undefined behavior, the transformation would not change the observable behavior of the program. (Or, if the program has more than one valid way to execute, then at least it must ensure that it still behaves in one of those valid ways.)
In the case of multithreaded programs, the "assuming the program is otherwise free of undefined behavior" frequently comes into play, applied to data races. A typical example is:
bool b;
void foo() {
int x=0, y=0;
if (b)
x=1;
if (b)
y=1;
assert(x == y);
}
Here, the compiler may assume that b
cannot change between the two tests, and optimize out the second one, thus turning the code into if (b) { x=1; y=1; }
.
You might say, "but isn't it possible that some other thread modifies b
in between?" Ah, well, if that were the case, you would have a data race. The data race rule says that for any two accesses to the same non-atomic variable, where at least one of them is a write, the program must have synchronization to ensure that one of them happens-before the other [intro.races p21]. Synchronization is typically provided by mutexes, or by an acquire load of an atomic variable that reads the value written by an earlier release store.
In the program above, the compiler can see no such synchronization is possible, because in between the two reads of b
, the code contains no mutex operations nor any acquire or release operations. Therefore, any concurrent write of b
would inevitably be a data race, which would be UB, so the compiler can assume it won't happen. (If the compiler's assumption is violated, then something bad will probably happen - but that's not the compiler's fault, it's your fault for invoking UB.)
However, in the code you posted, there is a way for synchronization to occur. Some other thread could take the lock while you have temporarily released it, modify condition_boolean
, and then release the lock. This would be free of data races. Assuming as before that the Lock
class is properly implemented, your release of the lock synchronizes with the other thread's acquisition, implying that your first read of condition_boolean
happens before the other thread's write. Likewise, the other thread's release synchronizes with your re-acquisition, so the other thread's write happens before your second read.
Thus all the operations on condition_boolean
are totally ordered by happens-before, so no data race occurs. Moreover, since the write to condition_boolean
happens before the second read, the second read must observe the new value [intro.races p13]. To provide this behavior, the compiler must actually perform the second read; it cannot optimize it out.
Once again, all of this assumes that the Lock
class is written correctly. If it is just a wrapper around std::shared_mutex
or something like that, then everything should be fine, since the standard defines that std::mutex
and friends provide the appropriate acquire and release semantics. If it's something you wrote yourself using std::atomic
s, then it's on you to ensure that you used a correct algorithm and appropriate std::memory_order
to provide synchronization between your unlock and lock routines. If you wrote it without using std::atomic
, then almost certainly it is wrong, causing a data race all by itself, and your program is broken regardless of what you do with condition_boolean
. (And no, volatile
does not substitute for std::atomic
.)
Having said all that, even so, the compiler is free to break any of these rules if it can prove that it makes no difference to an otherwise UB-free program. As a simple example, it may be possible to specify via a compiler option that the program will run single-threaded. In this case, the compiler can demote all atomic objects to normal ones, and delete all mutex operations and memory barriers. It could also then delete the second check of condition_boolean
. But of course, in such a case, it would be perfectly correct to do so, because condition_boolean
in fact cannot change.