May accesses to volatiles be reordered?

Question

Consider the following sequence of writes to volatile memory, which I've taken from David Chisnall's article at InformIT, "Understanding C11 and C++11 Atomics":

volatile int a = 1;
volatile int b = 2;
             a = 3;

My understanding from C++98 was that these operations could not be reordered, per C++98 1.9:

conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below ... The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions

Chisnall says that the constraint on order preservation applies only to individual variables, writing that a conforming implementation could generate code that does this:

a = 1;
a = 3;
b = 2;

Or this:

b = 2;
a = 1;
a = 3;

C++11 repeats the C++98 wording that

conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

but says this about volatiles (1.9/8):

Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

1.9/12 says that accessing a volatile glvalue (which includes the variables a, b, and c above) is a side effect, and 1.9/14 says that the side effects in one full expression (e.g., a statement) must precede the side effects of a later full expression in the same thread. This leads me to conclude that the two reorderings Chisnall shows are invalid, because they do not correspond to the ordering dictated by the abstract machine.

Am I overlooking something, or is Chisnall mistaken?

(Note that this is not a threading question. The question is whether a compiler is permitted to reorder accesses to different volatile variables in a single thread.)

possible duplicate of ["volatile" qualifier and compiler reorderings](http://stackoverflow.com/questions/2535148/volatile-qualifier-and-compiler-reorderings) — Adrian McCarthy, Feb 23 '13 at 13:45
An https version of the informit URL now exists, if anyone has the rep for a 1-character edit? — AJM, Feb 15 '23 at 13:47

JoergB · Accepted Answer · 2013-02-09T12:05:41.860

IMO Chisnalls interpretation (as presented by you) is clearly wrong. The simpler case is C++98. The sequence of reads and writes to volatile data needs to be preserved and that applies to the ordered sequence of reads and writes of any volatile data, not to a single variable.

This becomes obvious, if you consider the original motivation for volatile: memory-mapped I/O. In mmio you typically have several related registers at different memory location and the protocol of an I/O device requires a specific sequence of reads and writes to its set of registers - order between registers is important.

The C++11 wording avoids talking about an absolute sequence of reads and writes, because in multi-threaded environments there is not one single well-defined sequence of such events across threads - and that is not a problem, if these accesses go to independent memory locations. But I believe the intent is that for any sequence of volatile data accesses with a well-defined order the rules remain the same as for C++98 - the order must be preserved, no matter how many different locations are accessed in that sequence.

It is an entirely separate issue what that entails for an implementation. How (and even if) a volatile data access is observable from outside the program and how the access order of the program maps to externally observable events is unspecified. An implementation should probably give you a reasonable interpretation and reasonable guarantees, but what is reasonable depends on the context.

The C++11 standard leaves room for data races between unsynchronized volatile accesses, so there is nothing that requires surrounding these by full memory fences or similar constructs. If there are parts of memory that are truly used as external interface - for memory-mapped I/O or DMA - then it may be reasonable for the implementation to give you guarantees for how volatile accesses to these parts are exposed to consuming devices.

One guarantee can probably be inferred from the standard (see [into.execution]): values of type volatile std::sigatomic_t must have values compatible with the order of writes to them even in a signal handler - at least in a single-threaded program.

I think your answer is self-refuting. Your fourth paragraph refutes your first. Chisnall is saying that you can't portably rely on the C++ standard's ordering "guarantee" and this what your fourth paragraph says. But your first paragraph says he's wrong. There is simply no *portable* concept of a "sequence of reads and writes to volatile data", so there is no universal semantic meaning to the standard's guarantee. — David Schwartz, Feb 10 '13 at 13:10
@David Schwartz: I think you missed my point. Chisnall says that stores to different volatiles can be reordered (even if observable), but that coalescing writes to a single volatile is not allowed. I am saying that neither is allowed, if it can be observed (in an implementation-defined way). I am also saying, that there is no guarantee that all uses of `volatile`-qualified variables can be observed or have an observable store order. In summary: if the stores or their order can't be observed, Chisnall's "allowed reordering" is meaningless; if they can be observed, it is wrong. — JoergB, Feb 10 '13 at 15:56
If there's no portable notion of observing reads and writes to volatiles, then the standard making them part of the observable behavior of the program doesn't mean anything. If it means whatever makes sense for that platform, then portable code can't rely on it to do anything. In particular, if you try to apply this logic to whether, for example, memory fences are needed around volatiles, you get back to the question of whether CPU reordering is observable, and there's no clear answer to that. So even if we agree with you, it wouldn't help with any of the questions we're trying to answer. — David Schwartz, Oct 13 '14 at 04:58

Jonathan Wakely · Answer 2 · 2013-02-09T15:10:53.547

5

You're right, he's wrong. Accesses to distinct volatile variables cannot be reordered by the compiler as long as they occur in separate full expressions i.e. are separated by what C++98 called a sequence point, or in C++11 terms one access is sequenced before the other.

Chisnall seems to be trying to explain why volatile is useless for writing thread-safe code, by showing a simple mutex implementation relying on volatile that would be broken by compiler reorderings. He's right that volatile is useless for thread-safety, but not for the reasons he gives. It's not because the compiler might reorder accesses to volatile objects, but because the CPU might reorder them. Atomic operations and memory barriers prevent the compiler and the CPU from reordering things across the barrier, as needed for thread-safety.

See the bottom right cell of Table 1 at Sutter's informative volatile vs volatile article.

edited Feb 09 '13 at 15:10

answered Feb 09 '13 at 15:03

Jonathan Wakely

166,810
27
341
521

This is nonsense. The C++ standard doesn't put restrictions on what the compiler can do other than to put restrictions on what the code it generates does. And if you think the C++ standard says a compiler can't generate code that re-orders writes to volatile variables, then every x86 compiler violates the C++ standard, since they do not bypass the CPU's write posting buffers on writes to volatile variables and these buffers can re-order writes. – David Schwartz Feb 09 '13 at 21:04
You're putting words in my mouth then saying it's nonsense. There's a big difference between the compiler reordering accesses and the hardware doing it. Did you miss the part of my answer where I said the CPU might reorder accesses to volatiles? (It would have been more accurate to say hardware not CPU, but my point is the hardware might reorder things beyond the compiler's control.) – Jonathan Wakely Feb 10 '13 at 18:30
2

There's no difference, from the point of view of the C++ standard, between the compiler doing something and the compiler emitting instructions that cause the hardware to do something. If the CPU may reorder accesses to volatiles, then the standard doesn't require that their order be preserved. It can't be both ways. – David Schwartz Feb 10 '13 at 20:43
The CPU might reorder writes to main memory because it knows the difference can't be observed, but won't do it to memory mapped hardware if doing so would have a different observable effect. The CPU is in a position to know what can and can't be reordered. The compiler may not be in such a position. – Jonathan Wakely Feb 10 '13 at 20:51
2

The C++ standard doesn't make any distinction about what does the reordering. And you can't argue that the CPU can reorder them with no observable effect so that's okay -- the C++ standard *defines* their order as observable. A compiler is compliant with the C++ standard on a platform if it generates code that makes the platform do what the standard requires. If the standard requires accesses to volatiles not be reordered, then a platform the reorders them isn't compliant. There's no as-if rule here -- the order is defined is observable. – David Schwartz Feb 10 '13 at 21:08
SO what's your point that you're trying to make, that the compiler can reorder accesses to distinct volatiles? – Jonathan Wakely Feb 10 '13 at 21:17
My point is that if the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generate code makes the CPU do. – David Schwartz Feb 10 '13 at 21:57
2

@David The thing you're overlooking is that the c++ standard specifies the behavior of several threads interacting only in specific situations and everything else results in undefined behavior. A race condition involving at least one write is undefined if you don't use atomic variables, so the compiler is perfectly in its right to forego any cpu synchronization since you'll only notice the difference in a program that exhibits undefined behavior. – Voo Oct 10 '14 at 22:32
@Voo I 100% agree. What makes you think I'm overlooking that or don't agree? Do you think that somehow invalidates this statement, "*If the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so.*" – David Schwartz Oct 13 '14 at 00:31

Jerry Coffin · Answer 3 · 2013-02-09T07:45:09.207

For the moment, I'm going to assume your a=3s are just a mistake in copying and pasting, and you really meant them to be c=3.

The real question here is one of the difference between evaluation, and how things become visible to another processor. The standards describe order of evaluation. From that viewpoint, you're entirely correct -- given assignments to a, b and c in that order, the assignments must be evaluated in that order.

That may not correspond to the order in which those values become visible to other processors though. On a typical (current) CPU, that evaluation will only write values out to the cache. The hardware can reorder things from there though, so (for example) writes out to main memory happen in an entirely different order. Likewise, if another processor attempts to use the values, it may see them as changing in a different order.

Yes, this is entirely allowable -- the CPU is still evaluating the assignments in exactly the order prescribed by the standard, so the requirements are met. The standard simply doesn't place any requirements on what happens after evaluation, which is what happens here.

I should add: on some hardware it is sufficient though. For example, the x86 uses cache snooping, so if another processor tries to read a value that's been updated by one processor (but is still only in the cache) the processor that has the current value will put a hold on the read by the other processor until the current value can be written out so the other processor will see the current value.

That's not the case with all hardware though. While maintaining that strict model keeps things simple, it's also fairly expensive both in terms of extra hardware to ensure consistency and in simple speed when/if you have a lot of processors.

Edit: if we ignore threading for a moment, the question gets a little simpler -- but not much. According to C++11, §1.9/12:

When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.

As such, the accesses to volatile objects must be initiated in order, but not necessarily completed in order. Unfortunately, it's often the completion that's externally visible. As such, we pretty much come back to the usual as-if rule: the compiler can rearrange things as much as it wants, as long it produces no externally visible change.

I updated the post to eliminate a/c ambiguity, which is not present in Chisnall's article. I've also updated it to indicate that this is not a threading or memory-visibility-across-threads question. — KnowItAllWannabe, Feb 09 '13 at 07:31

score 0 · Answer 4 · answered Feb 09 '13 at 06:39

0

Looks like it can happen.

There is a discussion on this page:

http://gcc.gnu.org/ml/gcc/2003-11/msg01419.html

answered Feb 09 '13 at 06:39

Forhad Ahmed

1,761
13
18

3

That's a discussion of what to do in the generated code to prevent such reordering in the hardware. My question is about what the C++11 standard dictates. It's then up to compiler venders to make sure that the code they generate will prevent the hardware from performing impermissible reorderings. – KnowItAllWannabe Feb 09 '13 at 06:41

score 0 · Answer 5 · answered Feb 09 '13 at 06:56

0

It depends on your compiler. For example, MSVC++ as of Visual Studio 2005 guarantees* volatiles will not be reordered (actually, what Microsoft did is give up and assume programmers will forever abuse volatile - MSVC++ now adds a memory barrier around certain usages of volatile). Other versions and other compilers may not have such guarantees.

Long story short: don't bet on it. Design your code properly, and don't misuse volatile. Use memory barriers instead or full-blown mutexes as necessary. C++11's atomic types will help.

answered Feb 09 '13 at 06:56

Mahmoud Al-Qudsi

28,357
12
85
125

1

My question is about what the C++11 standard specifies, not what any particular compiler does. – KnowItAllWannabe Feb 09 '13 at 07:08
I believe the correct way is to use for example: `volatile std::atomic` to make sure compiler will not optimize away redundant writes (thats why volatile), and to avoid reorderings (thats why std::atomic). For anyone interested there is a related item 40 in Effective Modern C++. – marcinj Apr 27 '17 at 16:03

score -2 · Answer 6 · edited May 23 '17 at 12:02

-2

C++98 doesn't say the instructions cannot be re-ordered.

The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions

This says it's the actual sequence of the reads and writes themselves, not the instructions that generate them. Any argument that says that the instructions must reflect the reads and writes in program order could equally argue that the reads and writes to the RAM itself must occur in program order, and clearly that's an absurd interpretation of the requirement.

Simply put, this doesn't mean anything. There is no "one right place" to observe the orders of reads and writes (The RAM bus? The CPU bus? Between the L1 and L2 caches? From another thread? From another core?), so this requirement is essentially meaningless.

Versions of C++ prior to any references to threads clearly don't specify the behavior of volatile variables as seen from another thread. And C++11 (wisely, IMO) didn't change this but instead introduced sensible atomic operations with well-defined inter-thread semantics.

As for memory-mapped hardware, that's always going to be platform-specific. The C++ standard doesn't even pretend to address how that might be done properly. For example, the platform might be such that only a subset of memory operations are legal in that context, say ones that bypass a write posting buffer that can reorder, and the C++ standard certainly doesn't compel the compiler to emit the right instructions for that particular hardware device -- how could it?

Update: I see some downvotes because people don't like this truth. Unfortunately, it is true.

If the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generated code makes the CPU do.

Since nobody believes the standard requires the compiler to emit instructions to keep the CPU from reordering accesses to volatile variables, and modern compilers don't do this, nobody should believe the C++ standard prohibits the compiler from reordering accesses to distinct volatiles.

edited May 23 '17 at 12:02

Community

1
1

answered Feb 09 '13 at 06:52

David Schwartz

179,497
17
214
278

My question does not involve multiple threads, which is why I think the comparison of behavior specified by C++98 and C++11 is legitimate. – KnowItAllWannabe Feb 09 '13 at 07:08
@KnowItAllWannabe Then your question is ambiguous. If you don't mean "seen in a different order from a different thread", then what do you mean by "reordered"? – David Schwartz Feb 09 '13 at 07:09
Suppose a and b are memory-mapped, with a writing to the control part of a MMIO device, and b writing to the data part. Further suppose that the hardware won't work correctly unless something is written to the control part before any data gets written. In that case, it's important that a be written before b at runtime if the store to a precedes the store to b in the source code. – KnowItAllWannabe Feb 09 '13 at 07:11
@KnowItAllWannabe: Certainly the C++ standard can't say how you can correctly talk to some specific hardware device on some specific CPU. As seen by the MMIO device is one way the software's behavior can be "observed", but it's not the only way. And the C++ standard is not clear about where you would observe. (Paragraph added to answer.) – David Schwartz Feb 09 '13 at 07:16
No, but the standard can constrain the code that compilers generate. In the particular case of volatile, one of its primary use cases is to make it possible to write code respecting the communication requirements for memory-mapped devices. For example, seemingly unnecessary reads and writes to volatile variables may not be optimized away. Compilers must generate code for such reads and writes. My understanding is that temporal relationships between reads and writes of volatile variables (in a single thread) must also be preserved, hence my question. – KnowItAllWannabe Feb 09 '13 at 07:22
"*For example, seemingly unnecessary reads and writes to volatile variables may not be optimized away.*" That's not true. If that were true, C++ would be unimplementable on a platform that had a write-merging CPU cache even if all hardware devices were in a separate I/O space which had no such merging. And it's meaningless anyway. May not be opitimized away from what? From the instruction stream? That's not what the standard says. From the memory bus? That's not what the standard says. (And that's not how C++ is implemented on typical x86 platforms. Write merging is not disabled by volatile.) – David Schwartz Feb 09 '13 at 07:23
Regarding "where you would observe," the standard identifies only two places where behavior is observed: volatile memory and files. Things like RAM busses, CPU busses, caches, etc., play no role in observable behavior. And for the single-threaded question I'm posing, inter-thread visibility is irrelevant, too. – KnowItAllWannabe Feb 09 '13 at 07:26
@KnowItAllWannabe: Where would you observe accesses to volatile memory though? There is no "one right place" and the standard doesn't say. So just saying that accesses to volatile memory are "observable behavior" is basically meaningless. The "sequence of reads and writes" doesn't exist in any one place on modern hardware, so it can't just be made observable by the stroke of a pen. – David Schwartz Feb 09 '13 at 07:27
[as-if](https://en.cppreference.com/w/cpp/language/as_if). `"Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread."` Doesn't it mean the order of evaluation is maintained for `volatile` variables? – TruthSeeker May 10 '22 at 08:19
1

@TruthSeeker Nobody knows, because it's not clear what it means to observe accesses to volatile memory. Are not reordered *where*? Between the CPU and RAM? In the instruction stream? It's all completely platform specific. You will often hear it claimed that it means they must appear in the instruction stream in program order, but that claim is complete nonsense. The standard doesn't say what the compiler must do but what must happen when the code executes. – David Schwartz May 11 '22 at 01:38

May accesses to volatiles be reordered?

6 Answers6

Linked