Please refer to section 41.2.2 Instruction Reordering of "TCPL" 4th edition by B.Stroustrup, which I transcribe below:
To gain performance, compilers, optimizers, and hardware reorder instructions. Consider:
// thread 1: int x; bool x_init; void init() { x = initialize(); // no use of x_init in initialize() x_init = true; // ... }
For this piece of code there is no stated reason to assign to x before assigning to x_init. The optimizer (or the hardware instruction scheduler) may decide to speed up the program by executing x_init = true first. We probably meant for x_init to indicate whether x had been initialized by initializer() or not. However, we did not say that, so the hardware, the compiler, and the optimizer do not know that.
Add another thread to the program:
// thread 2: extern int x; extern bool x_init; void f2() { int y; while (!x_init) // if necessary, wait for initialization to complete this_thread::sleep_for(milliseconds{10}); y = x; // ... }
Now we have a problem: thread 2 may never wait and thus will assign an uninitialized x to y. Even if thread 1 did not set x_init and x in ‘‘the wrong order,’’ we still may have a problem. In thread 2, there are no assignments to x_init, so an optimizer may decide to lift the evaluation of !x_init out of the loop, so that thread 2 either never sleeps or sleeps forever.
- Does the Standard allow the reordering in thread 1? (some quote from the Standard would be forthcoming) Why would that speed up the program?
- Both answers in this discussion on SO seem to indicate that no such optimization occurs when there are global variables in the code, as x_init above.
What does the author mean by "to lift the evaluation of !x_init out of the loop"? Is this something like this?
if( !x_init ) while(true) this_thread::sleep_for(milliseconds{10}); y = x;