34

Suppose I have a struct like this:

volatile struct { int foo; int bar; } data;
data.foo = 1;
data.bar = 2;
data.foo = 3;
data.bar = 4;

Are the assignments all guaranteed not to be reordered?

For example without volatile, the compiler would clearly be allowed to optimize it as two instructions in a different order like this:

data.bar = 4;
data.foo = 3;

But with volatile, is the compiler required not to do something like this?

data.foo = 1;
data.foo = 3;
data.bar = 2;
data.bar = 4;

(Treating the members as separate unrelated volatile entities - and doing a reordering that I can imagine it might try to improve locality of reference in case foo and bar are at a page boundary - for example.)

Also, is the answer consistent for current versions of both C and C++ standards?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ted Shaneyfelt
  • 745
  • 5
  • 14
  • 3
    I don't know, but I sure hope so, else the queue structs I use for interrupt comms may be in trouble:) – Martin James Dec 14 '20 at 20:26
  • 10
    Not reordered full quote here for C++ (C may be different) - https://en.cppreference.com/w/cpp/language/cv _"an object whose type is volatile-qualified, or a subobject of a volatile object" ... _"Every access (read or write operation, member function call, etc.) made through a glvalue expression of volatile-qualified type is treated as a visible side-effect for the purposes of optimization "_ – Richard Critten Dec 14 '20 at 20:28
  • If this is about C++ and *"concurrency"* per se (as the tag says), check out `std::atomic`. It has similar non-reordering guarantees. – bloody Dec 14 '20 at 20:40
  • @bloody: Unfortunately `volatile std::atomic` types have some counterintuitive behavior, and at least on current compilers. For instance [here](https://godbolt.org/z/WhMh6E) a load from a `volatile std::atomic` is optimized out because its value is unused, even though it wouldn't be for a regular `volatile int`. – Nate Eldredge Dec 14 '20 at 21:17
  • 2
    @NateEldredge I never thought about joining `std::atomic` **with** `volatile`. If op exposes that structure for IO interaction then utilizing `volatile` is unquestionable. However op's tag suggests it's about concurrency (multithreaded program) in which case `std::atomic` is the right tool to use and not `volatile`. Perhaps this is just a loose style of tag naming. – bloody Dec 14 '20 at 21:43
  • 1
    @bloody primarily I'm looking at C, but since there's often subtle differences between the languages (C++ seems to have long departed from the goal of being a superset) I'm curious about volatile in particular as it would apply to portability of C code to C++. Yes C++ indeed has much better libraries for dealing with this sort of thing. – Ted Shaneyfelt Dec 15 '20 at 00:09
  • @NateEldredge That is required behaviour, it has to do with discarded-value expressions and what constitutes as reads. On the other hand, you shouldn't `volatile std::atomic` in the first place anyways. – Passer By Dec 15 '20 at 09:21
  • 1
    The compiler is not obliged to do anything, what constitutes a volatile access is implementation-defined, the standard just defines a certain ordering relation on accesses in terms of observable behaviour & the abstract machine, for implementation documentation to refer to. Code generation is not addressed by the standard. – philipxy Dec 16 '20 at 06:39

2 Answers2

30

They will not be reordered.

C17 6.5.2.3(3) says:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, 97) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

Since data has volatile-qualified type, so do data.bar and data.foo. Thus you are performing two assignments to volatile int objects. And by 6.7.3 footnote 136,

Actions on objects so declared [as volatile] shall not be “optimized out” by an implementation or reordered except as permitted by the rules for evaluating expressions.

A more subtle question is whether the compiler could assign them both with a single instruction, e.g., if they are contiguous 32-bit values, could it use a 64-bit store to set both? I would think not, and at least GCC and Clang don't attempt to.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • Thanks for quoting the standard (as I I happen to not have a copy), That seems to answer the question, but your text "you are assigning two volatile int objects" is misleading in that if they were not considered the same object the answer would be different, or there would need to be an additional restriction on the compiler to preserve the order of accesses that are volatile even if they are in unrelated objects. Maybe best to keep the quote and refine the answer text... – Ted Shaneyfelt Dec 14 '20 at 20:56
  • I think changing operations to be simultaneous (using one instruction for two assignments) ought to count as a reordering. If not by strict interpretation of the standard, then certainly by the spirit of the standard, the reason for such a restriction (which has a performance penalty) applies regardless of whether you get tricksy with the wording. – Ben Dec 14 '20 at 21:04
  • @TedShaneyfelt: Rephrased to "two assignments to `volatile int` objects". – Nate Eldredge Dec 14 '20 at 21:07
  • You mean two assignments to the same volatile int object? That would be satisfactory. – Ted Shaneyfelt Dec 14 '20 at 21:08
  • Note that they are different parts of the same object, not separate volatile objects, but the same object, as the first quote of the specification that you gave pointed out... – Ted Shaneyfelt Dec 14 '20 at 21:10
  • 1
    @Ben has to be correct about simultaneous being reordering. Changing operations to be simultaneous would affect hardware, for example, setting data bits then toggling a strobe bit on memory mapped I/O is clearly something that would be optimized out if that were allowed. – Ted Shaneyfelt Dec 14 '20 at 21:14
  • 3
    It is implementation-defined what constitutes access to a volatile-qualified object. If the C implementation targets hardware on which the effects of a 64-bit write could be the same as two 32-bit writes (e.g., two 32-bit writes might be seen separately by other components sharing memory, but they could be seen as indistinguishable, so a 64-bit write that is necessarily simultaneous is indistinguishable from two 32-bit writes that happen to be effectively simultaneous), then it could be reasonable for the implementation to define “access” so that a 64-bit write can be used. – Eric Postpischil Dec 14 '20 at 21:14
  • @Eric Postpischil for that to be the case, they wouldn't truly be written simultaneously, even though they are optimized to a single instruction. Then it seems OK. But if they are distinguishable, which they would be if the strobe went active as the data were being written instead of afterward, then it would be incorrectly reordered to be simultaneous. The compiler would need to take into consideration whether or not the alignment is such that it could get away with a single write instruction being split into two data write accesses. – Ted Shaneyfelt Dec 14 '20 at 21:23
  • 2
    @TedShaneyfelt: The members of a structure type are themselves objects. 6.2.5 (20): "A structure type describes a sequentially allocated nonempty set of member **objects**". So we are indeed performing two accesses to volatile objects, and they happen to be different objects, albeit they are also both part of the object `data`. I changed the wording to make it clear that reordering would still be forbidden even for two accesses to the same object (which is not the case at hand). – Nate Eldredge Dec 14 '20 at 21:23
  • 1. Yes, of course individual members of an object are themselves objects. 2. Yes, the standard in footnote 136 clearly does prohibit optimizing out accesses such as: `data.bar=4; data.foo=3;` would do. 3. The footnote could be interpreted as "Actions on [any of the] objects so declared [as volatile] shall not be “optimized out” by an implementation or reordered except as permitted by the rules for evaluating expressions. [but its relation to other such objects is not taken into account here], so the fact that they are part of the same object seems to be relevant. – Ted Shaneyfelt Dec 14 '20 at 21:33
  • Regarding preserving order of operations... http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html – Ted Shaneyfelt Dec 14 '20 at 21:40
17

If you want to use this in multiple threads, there is one significant gotcha.

While the compiler will not reorder the writes to volatile variables (as described in the answer by Nate Eldredge), there is one more point where write reordering can occur, and that is the CPU itself. This depends on the CPU architecture, and a few examples follow:

Intel 64

See Intel® 64 Architecture Memory Ordering White Paper.

While the store instructions themselves are not reordered (2.2):

  1. Stores are not reordered with other stores.

They may be visible to different CPUs in a different order (2.4):

Intel 64 memory ordering allows stores by two processors to be seen in different orders by those two processors

AMD 64

AMD 64 (which is the common x64) has similar behaviour in the specification:

Generally, out-of-order writes are not allowed. Write instructions executed out of order cannot commit (write) their result to memory until all previous instructions have completed in program order. The processor can, however, hold the result of an out-of-order write instruction in a private buffer (not visible to software) until that result can be committed to memory.

PowerPC

I remember having to be careful about this on Xbox 360 which used a PowerPC CPU:

While the Xbox 360 CPU does not reorder instructions, it does rearrange write operations, which complete after the instructions themselves. This rearranging of writes is specifically allowed by the PowerPC memory model

To avoid CPU reordering in a portable way you need to use memory fences like C++11 std::atomic_thread_fence or C11 atomic_thread_fence. Without them, the order of writes as seen from another thread may be different.

See also C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?

This is also noted in the Wikipedia Memory barrier article:

Moreover, it is not guaranteed that volatile reads and writes will be seen in the same order by other processors or cores due to caching, cache coherence protocol and relaxed memory ordering, meaning volatile variables alone may not even work as inter-thread flags or mutexes.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Suma
  • 33,181
  • 16
  • 123
  • 191
  • "This raises the issue of whether volatile should be given a real meaning that provides both atomicity and inter-thread visibility, roughly along the lines of Java volatiles. Although we believe that abstractly this provides a substantial improvement by giving semantics to something that currently has almost no portable semantics, there seem to be a number of practical obstacles driven by backward compatibility issues that lead us to at least hesitate." - Hans Boehm & Nick Maclaren http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html ... – Ted Shaneyfelt Dec 15 '20 at 19:04
  • Boehm & Maclaren's concern could have been addressed perhaps by the standards committee adding a syntactic construct within which volatiles would be forced to behave more along the spirit of volatility that they hesitate to require for reason s of backward compatibility. e.g. s new syntax: volatile { block } would be a sufficient addition to the language to allow backwards compatibility, but to also allow a more intuitive meaningful and useful behavior of volatile objects within that block. Like namespace, it might be best to allow it to span multiple function definitions. As it is it's goofy. – Ted Shaneyfelt Dec 15 '20 at 19:17
  • If you're using this from multiple threads, you have a data race and all bets are off. Unlike `atomic` types, `volatile` objects are not thread safe and do not avoid data races. About the only viable use for `volatile` these days is to access memory-mapped hardware devices, and in that case you will normally have the memory marked as "uncached" in some machine-specific fashion, which is supposed to inhibit CPU reordering and ensure that the device sees loads and stores in (assembly-level) program order. – Nate Eldredge Mar 15 '22 at 22:25