Should I expect that a C++ compiler would compile multi-threaded code with a data race "as coded", or it may do into something else?

Question

Let’s say I have hardware on which all the accesses to memory for a value less or equal to size of bool are thread-safe, and consistency issues in regards to caching are avoided because of the hardware or the code.

Should I expect that non-atomic accesses from multiple threads to the same objects will be compiled just “as coded” and so I get the thread-safe program for the platform?

No you will never get thread-safety out of the box. Access to data will not be atomic. So there are things like std::atomic, std::mutex, std::condition_variable etc.. to manage multithread stuff. (Cache integrity is handled by the CPU hardware) — Pepijn Kramer, Sep 05 '21 at 09:24
@PKramer, as for "cache integrity is handled by the CPU hardware", it is possible that a multi-core coherence mechanism does not do some of coherence by itself but triggered to do it only according to corresponded code commands. — Arthur P. Golubev, Sep 05 '21 at 09:35
If behaviour is undefined, then (by definition in the standard) the standard does not describe any constraints on what happens. Practically, that means the the compiler is permitted to do anything it likes, and there is no such thing as "as coded". While the standard *permits* an implementation to produce some specific behaviour that is documented for that implementation, it *requires* no implementation to do so. As soon as you start arguing "but my hardware does X" the counter-argument is that "the standard permits an implementation to emit code that is unaffected by X". — Peter, Sep 05 '21 at 10:04
@Peter, you are totally right, but it might be not obvious for someone what data races have to do with compile-time decisions of a compiler. — Arthur P. Golubev, Sep 05 '21 at 10:11
@ArthurP.Golubev If behaviour is undefined, compile-time decisions are irrelevant. — Peter, Sep 05 '21 at 10:22
One thing to consider is that without any memory fence (std::atomic, std::mutex etc), the compiler is allowed to assume that it can optimize the program without concern for multi-threading issues. So if a thread is looping `while( bKeepThreadRunning )` is true and no code in the loop changes the loop test variable the compiler can optimize out the read completely. So race conditions aside the optimizer can break your program because data flow analysis says it can optimize more aggressively than you would like it to ie `while( bKeepThreadRunning )` becomes `while( true )` — Richard Critten, Sep 05 '21 at 10:28
@Peter, You wrote "to emit code that is unaffected by X". That is an example of compile-time decisions by a compiler. — Arthur P. Golubev, Sep 05 '21 at 10:31
@ArthurP.Golubev No it's not. If behaviour is undefined, there is nothing stopping a compiler from accidentally or unintentionally emitting code that is unaffected by X. No compile time decision (or even a decision made in design of the compiler) required. — Peter, Sep 05 '21 at 10:35
@Peter, There looks to be just a misunderstanding of the meaning of the words. The “compile-time decision” is not a term. A compiler is a program which implements logic as any other program does. Though if the compiler is allowed to do nothing when undefined behavior, in order to implement such things as omitting the only specific code, there have to be both corresponding design and corresponding input. I used the word “decision” just for starting a particular part of the compiler's code as a reaction on corresponding input, no more. — Arthur P. Golubev, Sep 05 '21 at 13:02
@ArthurP.Golubev That's the thing. You're treating undefined behaviour as if it translates to some requirement for a decision, action, or outcome. By definition (i.e. the standard specifically defines the meaning of "undefined" as it is used in the standard) no such requirements exist. — Peter, Sep 06 '21 at 00:29
@Peter, *“You're treating undefined behaviour as if it translates to some requirement for a decision, action, or outcome”*. **No, I am not** at all. And I didn’t write anything making anyone think so, It was **your** argument that “the standard permits an implementation to emit code that is unaffected by X”. My inspiration was that one may wonder **if there is anything in data races to prevent the compiler from compiling the code as described in the post**; [See the next comment for the continuation - 1...] — Arthur P. Golubev, Sep 06 '21 at 07:54
[1 - ...continuation] and as for undefined behaviour when data races, one knowing how compilers actually work, has reasons to consider that for a regular compiler a special decision would require for the case of undefined behaviour to take any effect; without such decision both the compiler just compile “as coded” and the effect of data races depends on the environment - the standard cannot accept the former, so states that in the case the behaviour is undefined, but as far as compilers actually work there is nothing troubling compilation in this particular case of undefined behaviour. — Arthur P. Golubev, Sep 06 '21 at 07:54
You should only expect that a C++ compiler compile code so that it behaves as the standard requires or as the particular compiler vendor's documentation specifies. Other than that a compiler can behave quite unpredictably. Take for example the situation outlined in [this old stackoverflow question](https://stackoverflow.com/a/4577565/12711) - simply changing the _name_ of a variable resulted in different behavior because the code relied on undefined behavior. — Michael Burr, Sep 06 '21 at 20:38
@MichaelBurr: Neither the C Standard, nor the parts of the C++ Standard that are derived therefrom, make any distinction between actions which most implementations should be expected to process in at least somewhat predictable fashion, and which non-portable code for such implementations should be expected to exploit, and actions about which programmers should have no expectations whatsoever. — supercat, Oct 07 '21 at 19:29
Due to the stochastic nature of race conditions, there's a bit of a "if a tree falls in a forest and nobody hears it, does it make a sound?" flavor to the question.... something like "if a compiler creates a buggy program, but the resulting faults are too subtle for anyone to notice, is the program still buggy?" -- the answer to that question is *yes*, because even if you run the buggy program for 10 months straight and it runs perfectly, the race condition can still eventually bite you in month 11, or year 5, or etc. Thread safety is all about what's *explicitly guaranteed*. — Jeremy Friesner, Oct 08 '21 at 14:59

Arthur P. Golubev · Accepted Answer · 2021-10-08T19:49:05.637

Before C++11 the standard of the language just didn’t concern about multi-threading at all, and it was not possible to create portable (conforming to the standard of the language) multi-threaded C++ programs. One had to use third-party libraries and the thread-safety of the program on the code level could be provided only by internals of these libraries, which in their turn used corresponding platform features, and compilers compiled the code just as if it were single-threaded.

Since C++11, according to the standard:

two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.
two actions are potentially concurrent if -- they are performed by different threads, or -- they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation;
the execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described in the standard ([intro.races] section 22 point for C++20: https://timsong-cpp.github.io/cppwp/n4868/intro.races#22).
any such data race results in undefined behavior.

An atomic operation is indivisible with regards to any other atomic operation that involves the same object. An operation happens before another one means that writes to memory of the first operation make effect for the reads of the second one.

According to the standard of the language, undefined behaviour is just that for which the standard imposes no requirements.

Some people wrongly consider undefined behaviour only to be what occurs in run-time and does not relate to compilation, but the standard operates undefined behaviour to regulate compilation so that there is nothing specified to expect for both compilation and accordingly execution in the cases of undefined behaviour.

The standard of the language does not forbid diagnostic of undefined behaviour by compilers.

The standard explicitly states that in the case of undefined behaviour, besides of ignoring with an unpredictable result, it is permitted to behave in an environment-documented (including documentation of the compiler) manner (literally do everything possible, though documented) both during translation and during execution, and to terminate both translation or execution (https://timsong-cpp.github.io/cppwp/n4868/intro.defs#defns.undefined).

So, a compiler is even permitted to generate senseless code for the cases of undefined behaviour.

data race is not the state when conflicting accesses to an object factually occur at the same time, but the state when a code having even potential (depending on the environment) conflicting accesses for an object is being executed (considering opposite on the level of the language is impossible because a write to the memory by the hardware caused by an operation may be delayed for unspecified time in bounds of the concurrent code (and note, besides it, operations may be in bounds of some restrictions dispersed over the concurrent code by both a compiler and a hardware)).

As for a code which causes undefined behaviour only for some of inputs (so may happen or not for an execution),

one the one hand, the as-ifrule (https://en.cppreference.com/w/cpp/language/as_if) permits compilers to generate code that would work correctly only for the inputs which do not cause undefined behaviour (for instance, so that issue a diagnostic message when the input causing undefined behaviour happened; issuing diagnostic messages is explicitly noted as a part of permissible undefined behaviour in the standard);
one the other hand, in practice it is often that a compiler generate code as if such input would never happen, see examples of such behaviour at https://en.cppreference.com/w/cpp/language/ub

Note, in contrast to potential (I use the word potential here because of what is in the note marked with * below) data races, the cases of the examples from the link are quite easy to detect when compiling.

If it would be possible for a compiler to easily detect a data race, a reasonable compiler would just terminate compilation rather than compiling anything, but:

One the one hand, [*] it is practically impossible to conclude that a data race will guaranteedly happen in run-time, just because in run-time it can happen that all the concurrent code instances over a single one fail to start because of environmental reasons, which makes any multi-threaded code apriori to be potentially single-threaded and so potentially avoiding data races at all (though, in many cases it would break semantic of the program, but it is not a concern of compilers).

On the other hand, a compiler is permitted to inject some code so that a data race is handled in run-time (note, not only for something sensible such issuing a diagnostic message, but in any (though, documented), even harmful, manner), but besides the fact that such injections would be a disputable (even when for something reasonable) overhead:

some potential data races can be undetectable at all because of separate compilation of translation units;
some potential data races may either exist or not in a specific execution depending on run-time input data, which would make the injections monstrous for being correct;
it may be complex enough and too expensive to detect data races even when possible because of complex constructs of the code and logic of the program.

So, at present, it is normal for compilers to not even try to detect data races.

Besides data races themselves, for the code where data races are possible and which is compiled as it were single-threaded there are the following problems:

under the as-if rule (https://en.cppreference.com/w/cpp/language/as_if) a variable may be eliminated if it looks for the compiler that there is no difference, at that compilers don’t take into account multi-threading unless specific multi-threading means of the language and its standard library are used;
operations may be reordered from what it “was coded” by both a compiler under the as-if rule and a hardware while execution if it looks that there is no difference, at unless specific multi-threading means of the language and its standard library are used and that a hardware may implement various of different approaches to restriction the reordering, including requirements for explicit corresponded commands in the code;

It is specified in the question that the following point is not the case, but to complete the set of the possible problems, the following is theoretically possible on some hardware:

though some people be wrong that a multi-core coherence mechanism always completely coherate data, which is when an object is updated by a core, other cores get the updated value when read, it is possible that a multi-core coherence mechanism does not do some or even all of coherence by itself but only when is triggered by corresponded commands in the code, so that without these corresponded commands the value to be written to an object gets stuck in the cache of the core so that either never or later than appropriate reaches other cores.

Please note, appropriate using of reasonably implemented (see the note marked with ** below for details) volatile modifier for variables if using volatile modifier for the type is possible, solves the elimination and the reordering by a compiler problems, but not reordering by hardware and not “getting stuck” in cache ones.

[**] To regret, actually, the standard of the language says “The semantics of an access through a volatile glvalue are implementation-defined” (https://timsong-cpp.github.io/cppwp/n4868/dcl.type.cv#5). Though the standard of the language notes that “volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation.” (https://timsong-cpp.github.io/cppwp/n4868/dcl.type.cv#note-5), which would help to avoid elimination and reordering by the compiler if volatile is implemented in correspondence to what it was intended for, that is correctly for values potentially accessed by the environment (for instances, hardware, operating system, other applications) of the code, formally compilers are not obligated to implement volatile in correspondence to what it was intended for. But, at the same time, modern versions of the standard note that “Furthermore, for some implementations, volatile might indicate that special hardware instructions are required to access the object.” (https://timsong-cpp.github.io/cppwp/n4868/dcl.type.cv#note-5), which means that some implementations also might implement preventing reordering by hardware and preventing “getting stuck” in cache, though it is not what volatile was intended for.

Guaranteedly (as far as the implementation conforms to the standard), all the three problems, as well as data races issue, may be solved only by using specific multi-threading means, including multi-threading part of the standard library of C++ since C++11.

So for portable, confirming the standard of the language, C++ program must protect its execution from any data races.

If a compiler compiles as if the code were single-threaded (i.e. ignores data race), and reasonably implemented (as noted in the note marked with ** above) volatile modifier is used appropriately, and there is no caching and reordering by hardware issues, one will get the thread-safe machine code without using the data race protection (from the environment-dependent, not confirming the standard starting from C++11, C++ code).

As for examples of potential safety of using a non-atomic bool flag for a specific environment from multiple threads, at https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables you can read that implementations of initialization of static local variables (since C++11) usually use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.

But note, these solutions are environment-dependent, and, since they are parts of implementations of the compilers themselves, but not a program using the compilers, there is no concern of conforming to the standard there.

To make your program corresponding to the standard of the language and be protected (as far as the compiler conforms to the standard) against a compiler implementation details liberty, you must protect the flag of a double-check lock from data races, and the most reasonable way for it, would be using std::atomic or std::atomic_bool.

See details in regards to implementation of double-checked locking pattern in C++ (including using a non-atomic flag with a data race) in my answer post https://stackoverflow.com/a/68974430/1790694 on the question about implementation of double-check lock in C++ Is there any potential problem with double-check lock for C++? (keep in mind that the code there contains multi-threading operations in the threads which influences on all the access operations in the thread, triggering memory coherence and preventing reordering, so that the whole code apriori is not to be compiled as it were single-threaded).

*"for the draft document №4861 ... it is in 6.9.2.1 22"* You can use the symbolic names such as [`[intro.multithread.general]`](http://eel.is/c++draft/intro.multithread.general), which are a lot more stable between standard revisions. — HolyBlackCat, Sep 05 '21 at 11:48
@HolyBlackCat, Thank you for the tip. To regret, for some reason at present I'm not able to add the reference name to the link in the answer post, but I took into account the idea. — Arthur P. Golubev, Sep 05 '21 at 13:45
The C++ Standard explicitly refrains from defining any standard of conformance for programs, and explicitly states that even when it would appear to impose constraints upon programs, that is simply because doing so is easier than trying to describe all the circumstances in which an implementation would or would not be required to work precisely as otherwise described. — supercat, Oct 07 '21 at 19:26

score 0 · Answer 2 · answered Sep 05 '21 at 16:15

If you have such hardware, then the answer is "yes". The question is, what is that hardware?

Suppose you had a single core CPU - say, an 80486. Where, in such an architecture, might the value be? The answers are register, cache or RAM depending on whethe or not the value is about to be operated on.

The problem is, if you have a preemptive multi-threading operating system, you can't guarantee that, when a context switch happens, that the value has been flushed from registers to memory (cache / RAM). The value might be in a register as a result of an operation that has just produced the value as a result, and the preemption can happen before the next op code that would move it from the op's "result" register to memory. The preemptive switch to another thread would result in the new thread accessing the value from memory, which is stale.

So, that hardware is not any hardware that's been made in the past 40 years.

Conceivably it would be possible to have a CPU that has no registers, i.e. it's using RAM as its register set. However, no one has made one of those, because it would be very slow.

So in practice, there is no such hardware, so the answer is "no" it won't be thread safe.

You'd have to have something like a cooperative multitasking OS than ensured that the results of operations in registers got MOVed back to RAM before running a new thread.

A multi-core coherence mechanism may do the coherence on the hardware level by itself (propagating all the writes to other core which read it after the write). — Arthur P. Golubev, Sep 05 '21 at 16:40
@ArthurP.Golubev, Sure, but that's memory coherence. The latest value of a variable may not yet be back in memory because a write has not yet happened following an operation. Using locks is a way of explicitly advertising that the value is changing, and don't access it until the lock is released. — bazza, Sep 05 '21 at 17:23

score 0 · Answer 3 · answered Oct 07 '21 at 20:12

It has for decades been common and not astonishing for compilers, even those intended to be suitable for multi-threaded or interrupt-based programming, to consolidate non-qualified accesses to objects when there are no intervening volatile-qualified accesses. While the C Standard recognizes the possibility of an implementation treating all accesses as though volatile qualified, but doesn't particularly recommend such treatment. As to whether volatile should be sufficient, that seems to be controversial.

Even before the publication of the first C++ Standard, the C Standard specified that the semantics of volatile are implementation-defined, thus allowing implementations designed to be suitable for multi-tasking or interrupt-based systems to provide semantics appropriate to that purpose without requiring special syntax, while allowing those that weren't intended to support such tasks to generate code that would be slightly more efficient when weaker semantics would suffice, but behave in broken fashion when stronger semantics were required.

While some people claim it was impossible to write portable multi-threaded code prior to the addition of atomics to the language standard, that ignores the fact that many people could and did write multi-threaded code which would be portable among all implementations for the intended target platform, whose designers made the semantics of volatile strong enough to support such code without requiring special syntax. The Standard didn't specify what implementations would need to do in order to be suitable for that purpose, because (1) it didn't require implementations to be suitable for such purpose, and (2) compiler writers were expected to know their customers' needs better than the Committee ever could.

Unfortunately, some compiler writers who were sheltered from normal market pressures have interpreted the Standard's failure to require that all implementations process volatile in a manner suitable for multi-threaded or interrupt-based programs without requiring special syntax as a judgment that no implementations should be expected to do so. Thus, there is a lot of code which would be reliable if processed by commercial implementations, but would not be processed reliably by compilers like clang or gcc which are designed to require special syntax when performing such tasks.

Should I expect that a C++ compiler would compile multi-threaded code with a data race "as coded", or it may do into something else?

3 Answers3

Linked