Before C++11 the standard of the language just didn’t concern about multi-threading at all, and it was not possible to create portable (conforming to the standard of the language) multi-threaded C++ programs. One had to use third-party libraries and the thread-safety of the program on the code level could be provided only by internals of these libraries, which in their turn used corresponding platform features, and compilers compiled the code just as if it were single-threaded.
Since C++11, according to the standard:
- two expression evaluations
conflict
if one of them modifies a memory location and the other one reads or modifies the same memory location.
- two actions are
potentially concurrent
if
-- they are performed by different threads, or
-- they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation;
- the execution of a program contains a
data race
if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before
the other, except for the special case for signal handlers described in the standard ([intro.races] section 22 point for C++20: https://timsong-cpp.github.io/cppwp/n4868/intro.races#22).
- any such
data race
results in undefined behavior
.
An atomic
operation is indivisible with regards to any other atomic operation that involves the same object.
An operation happens before
another one means that writes to memory of the first operation make effect for the reads of the second one.
According to the standard of the language, undefined behaviour
is just that for which the standard imposes no requirements.
Some people wrongly consider undefined behaviour
only to be what occurs in run-time and does not relate to compilation, but the standard operates undefined behaviour
to regulate compilation so that there is nothing specified to expect for both compilation and accordingly execution in the cases of undefined behaviour
.
The standard of the language does not forbid diagnostic of undefined behaviour
by compilers.
The standard explicitly states that in the case of undefined behaviour
, besides of ignoring with an unpredictable result, it is permitted to behave in an environment-documented (including documentation of the compiler) manner (literally do everything possible, though documented) both during translation and during execution, and to terminate both translation or execution (https://timsong-cpp.github.io/cppwp/n4868/intro.defs#defns.undefined).
So, a compiler is even permitted to generate senseless code for the cases of undefined behaviour
.
data race
is not the state when conflicting accesses to an object factually occur at the same time, but the state when a code having even potential (depending on the environment) conflicting accesses for an object is being executed (considering opposite on the level of the language is impossible because a write to the memory by the hardware caused by an operation may be delayed for unspecified time in bounds of the concurrent code (and note, besides it, operations may be in bounds of some restrictions dispersed over the concurrent code by both a compiler and a hardware)).
As for a code which causes undefined behaviour
only for some of inputs (so may happen or not for an execution),
- one the one hand, the
as-if
rule (https://en.cppreference.com/w/cpp/language/as_if) permits compilers to generate code that would work correctly only for the inputs which do not cause undefined behaviour
(for instance, so that issue a diagnostic message when the input causing undefined behaviour
happened; issuing diagnostic messages is explicitly noted as a part of permissible undefined behaviour
in the standard);
- one the other hand, in practice it is often that a compiler generate code as if such input would never happen, see examples of such behaviour at https://en.cppreference.com/w/cpp/language/ub
Note, in contrast to potential (I use the word potential
here because of what is in the note marked with *
below) data races
, the cases of the examples from the link are quite easy to detect when compiling.
If it would be possible for a compiler to easily detect a data race
, a reasonable compiler would just terminate compilation rather than compiling anything, but:
One the one hand, [*]
it is practically impossible to conclude that a data race will guaranteedly happen in run-time, just because in run-time it can happen that all the concurrent code instances over a single one fail to start because of environmental reasons, which makes any multi-threaded code apriori to be potentially single-threaded and so potentially avoiding data races
at all (though, in many cases it would break semantic of the program, but it is not a concern of compilers).
On the other hand, a compiler is permitted to inject some code so that a data race
is handled in run-time (note, not only for something sensible such issuing a diagnostic message, but in any (though, documented), even harmful, manner), but besides the fact that such injections would be a disputable (even when for something reasonable) overhead:
- some potential
data races
can be undetectable at all because of separate compilation of translation units;
- some potential
data races
may either exist or not in a specific execution depending on run-time input data, which would make the injections monstrous for being correct;
- it may be complex enough and too expensive to detect
data races
even when possible because of complex constructs of the code and logic of the program.
So, at present, it is normal for compilers to not even try to detect data races
.
Besides data races
themselves, for the code where data races are possible and which is compiled as it were single-threaded there are the following problems:
- under the
as-if
rule (https://en.cppreference.com/w/cpp/language/as_if) a variable may be eliminated if it looks for the compiler that there is no difference, at that compilers don’t take into account multi-threading unless specific multi-threading means of the language and its standard library are used;
- operations may be reordered from what it “was coded” by both a compiler under the
as-if
rule and a hardware while execution if it looks that there is no difference, at unless specific multi-threading means of the language and its standard library are used and that a hardware may implement various of different approaches to restriction the reordering, including requirements for explicit corresponded commands in the code;
It is specified in the question that the following point is not the case, but to complete the set of the possible problems, the following is theoretically possible on some hardware:
- though some people be wrong that a multi-core coherence mechanism always completely coherate data, which is when an object is updated by a core, other cores get the updated value when read, it is possible that a multi-core coherence mechanism does not do some or even all of coherence by itself but only when is triggered by corresponded commands in the code, so that without these corresponded commands the value to be written to an object gets stuck in the cache of the core so that either never or later than appropriate reaches other cores.
Please note, appropriate using of reasonably implemented (see the note marked with **
below for details) volatile
modifier for variables if using volatile
modifier for the type is possible, solves the elimination and the reordering by a compiler problems, but not reordering by hardware and not “getting stuck” in cache ones.
[**]
To regret, actually, the standard of the language says “The semantics of an access through a volatile glvalue are implementation-defined” (https://timsong-cpp.github.io/cppwp/n4868/dcl.type.cv#5).
Though the standard of the language notes that “volatile
is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation.” (https://timsong-cpp.github.io/cppwp/n4868/dcl.type.cv#note-5), which would help to avoid elimination and reordering by the compiler if volatile
is implemented in correspondence to what it was intended for, that is correctly for values potentially accessed by the environment (for instances, hardware, operating system, other applications) of the code, formally compilers are not obligated to implement volatile
in correspondence to what it was intended for.
But, at the same time, modern versions of the standard note that “Furthermore, for some implementations, volatile
might indicate that special hardware instructions are required to access the object.” (https://timsong-cpp.github.io/cppwp/n4868/dcl.type.cv#note-5), which means that some implementations also might implement preventing reordering by hardware and preventing “getting stuck” in cache, though it is not what volatile
was intended for.
Guaranteedly (as far as the implementation conforms to the standard), all the three problems, as well as data races
issue, may be solved only by using specific multi-threading means, including multi-threading part of the standard library of C++ since C++11.
So for portable, confirming the standard of the language, C++
program must protect its execution from any data races
.
If a compiler compiles as if the code were single-threaded (i.e. ignores data race
), and reasonably implemented (as noted in the note marked with **
above) volatile
modifier is used appropriately, and there is no caching and reordering by hardware issues, one will get the thread-safe machine code without using the data race protection (from the environment-dependent, not confirming the standard starting from C++11, C++ code).
As for examples of potential safety of using a non-atomic bool
flag for a specific environment from multiple threads, at https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables you can read that implementations of initialization of static local variables
(since C++11) usually use variants of the double-checked locking pattern
, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean
comparison.
But note, these solutions are environment-dependent, and, since they are parts of implementations of the compilers themselves, but not a program using the compilers, there is no concern of conforming to the standard there.
To make your program corresponding to the standard of the language and be protected (as far as the compiler conforms to the standard) against a compiler implementation details liberty, you must protect the flag of a double-check lock
from data races, and the most reasonable way for it, would be using std::atomic
or std::atomic_bool
.
See details in regards to implementation of double-checked locking pattern
in C++ (including using a non-atomic flag with a data race) in my answer post https://stackoverflow.com/a/68974430/1790694 on the question about implementation of double-check lock
in C++ Is there any potential problem with double-check lock for C++? (keep in mind that the code there contains multi-threading operations in the threads which influences on all the access operations in the thread, triggering memory coherence and preventing reordering, so that the whole code apriori is not to be compiled as it were single-threaded).