Is it legal to optimize away stores/construction of volatile stack variables?

Question

I noticed that clang and gcc optimize away the construction of or assignment to a volatile struct declared on the stack, in some scenarios. For example, the following code:

struct nonvol2 {
    uint32_t a, b;
};

void volatile_struct2()
{
    volatile nonvol2 temp = {1, 2};
}

Compiles on clang to:

volatile_struct2(): # @volatile_struct2()
  ret

On the other hand, gcc does not remove the stores, although it does optimize the two implied stores into a single one:

volatile_struct2():
        movabs  rax, 8589934593
        mov     QWORD PTR [rsp-8], rax
        ret

Oddly, clang won't optimize away a volatile store to a single int variable:

void volatile_int() {
    volatile int x = 42;
}

Compiles to:

volatile_int(): # @volatile_int()
  mov dword ptr [rsp - 4], 1
  ret

Furthermore a struct with 1 member rather than 2 is not optimized away.

Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:

typedef struct {
    volatile uint32_t a, b;
} vol2;

void volatile_def2()
{
    vol2 temp = {1, 2};
    vol2 temp2 = {1, 2};
    temp.a = temp2.a;
    temp.a = temp2.a;
}

simply compiles down to a simple ret.

While it seems entirely "reasonable" to remove these stores which are pretty much impossible to observe by any reasonable process, my impression was that in the standard volatile loads and stores are assumed to be part of the observable behavior of the program (in addition to calls to IO functions), full stop. The implication being they are not subject to removal by "as if", since it would by definition change the observable behavior of the program.

Am I wrong about that, or is clang breaking the rules here? Perhaps construction is excluded from the cases where volatile must be assumed to have side effects?

This reproduces with Apple LLVM version 8.1.0 (clang-802.0.42) targeting x86_64 on macOS 10.12.6, compiling with “-O3”. I had to insert `#include ` into the source code. With the two-element struct, the write is optimized away. With a one-element struct, it is not. — Eric Postpischil, Oct 29 '17 at 00:27
@EricPostpischil - I included the [godbolt link](https://godbolt.org/g/GE15JX) in the quesiton, although perhaps it wasn't very obvious there. You can get the info you need there by changing the arguments to use `--version`. I got: `clang version 5.0.0 (tags/RELEASE_500/final 312636), Target: x86_64-unknown-linux-gnu, Thread model: posix`. The behavior has been the same though back as far as I could check (back to clang 3.0). — BeeOnRope, Oct 29 '17 at 00:32
It reproduces in C instead of C++ too. So it is not a C++ constructor issue. I think it may be a genuine compiler bug. (And, even we are wrong about volatile accesses being required [so the compiler is right to optimize away the two-element struct access], the compiler would be wrong not to optimize away the one-element struct access. [Wrong because it misses an optimization, not because it violates the language specification.]) — Eric Postpischil, Oct 29 '17 at 00:39
@EricPostpischil - to be fair I wrote about clang about, but `gcc` actually optimizes some other cases even more aggressively - where the `struct` members themselves are declared `volatile`. See [here](https://godbolt.org/g/5WGsqV). — BeeOnRope, Oct 29 '17 at 00:55
It's the 1-member struct that loads `1` from memory. The `volatile int x = 42` does store `42`, not `1`. — Peter Cordes, Oct 29 '17 at 02:55
semi-related: `struct`-assignment in C11 is not by-member, so it disregards `_Atomic` qualifiers on members (and presumably also `volatile`). (In C++11, `atomic` has no copy-constructor, and it does go by member, so it doesn't compiler.) https://godbolt.org/g/heUd1R. I was thinking about asking a question about this little-documented fact, but I think I answered my own C11 question by looking at the standard, and gcc should probably warn, but it's pretty clear it's the correct behaviour to just memcpy or `rep stosq` (to copy from a zeroed struct). — Peter Cordes, Oct 29 '17 at 02:59
struct-assignment or initialization in C or C++ doesn't explain why gcc optimizes away to nothing with volatile members which you access by member. (Your last example. Added to godbolt for gcc and clang https://godbolt.org/g/1oGFq4 (C11 and C++11, although it turns out that porting to C11 doesn't change either compiler's asm output).) — Peter Cordes, Oct 29 '17 at 05:15
Peter, I'm not clear who you are replying to and about what in [this comment](https://stackoverflow.com/q/46994763/149138#comment80939346_46994763). Was it to a deleted comment? — BeeOnRope, Oct 29 '17 at 05:51
Your question shows C source for `void volatile_int()`, but then asm for `volatile_struct1`. The surrounding text looks like you meant to be showing source + asm for the same function. — Peter Cordes, Oct 30 '17 at 05:43
@PeterCordes - thanks, fixed. I meant to show the `volatile_int` function in both places. — BeeOnRope, Oct 30 '17 at 06:58
volatile really means that you can set a breakpoint and change volatile variables in the debugger, restart, and execution will proceed as if the code contained an actual assignment. This is not guaranteed on non volatile variable, as the compiler can assume that no parasitic change occur. — curiousguy, Nov 01 '17 at 21:39

Nicol Bolas · Answer 1 · 2017-10-29T06:59:10.847

Let us investigate what the standard directly says. The behavior of volatile is defined by a pair of statements. [intro.execution]/7:

The least requirements on a conforming implementation are:

Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.

...

And [intro.execution]/14:

Reading an object designated by a volatile glvalue (6.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

Well, [intro.execution]/14 does not apply because nothing in the above code constitutes "reading an object". You initialize it and destroy it; it is never read.

So that leaves [intro.execution]/7. The phrase of importance here is "accesses through volatile glvalues". While temp certainly is a volatile value, and it certainly is a glvalue... you never actually access through it. Oh yes, you initialize the object, but that doesn't actually access "though" temp as a glvalue.

That is, temp as an expression is a glvalue, per the definition of glvalue: "an expression whose evaluation determines the identity of an object, bit-field, or function." The statement creating and initializing temp results in a glvalue, but the initialization of temp isn't accessing through a glvalue.

Think of volatile like const. The rules about const objects don't apply until after it is initialized. Similarly, the rules about volatile objects don't apply until after it is initialized.

So there's a difference between volatile nonvol2 temp = {1, 2}; and volatile nonvol2 temp; temp.a = 1; temp.b = 2;. And Clang certainly does the right thing in that case.

That being said, the inconsistency of Clang with regard to this behavior (optimizing it out only when using a struct, and only when using a struct that contains more than one member) suggests that this is probably not a formal optimization by the writers of Clang. That is, they're not taking advantage of the wording so much as this just being an odd quirk of some accidental code coming together.

Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:

GCC's behavior here is:

Not in accord with the standard, as it is in violation of [intro.execution]/7, but
There's absolutely no way to prove that it isn't compliant with the standard.

Given the code you wrote, there is simply no way for a user to detect whether or not those reads and writes are actually happening. And I rather suspect that the moment you do anything to allow the outside world to see it, those changes will suddenly appear in the compiled code. However much the standard wishes to call it "observable behavior", the fact is that by C++'s own memory model, nobody can see it.

GCC gets away with the crime due to lack of witnesses. Or at least credible witnesses (anyone who could see it would be guilty of invoking UB).

So you should not treat volatile like some optimization off-switch.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/158717/discussion-on-answer-by-nicol-bolas-is-it-legal-to-optimize-away-stores-construc). — Andy, Nov 10 '17 at 16:44

score 4 · Accepted Answer · answered Oct 31 '17 at 19:22

From the point of view of the Standard, there is no requirement that implementations document anything about how any objects are physically stored in memory. Even if an implementation documents the behavior of using pointers of type unsigned char* to access objects of a certain type, an implementation would be allowed to physically store data some other way and then have the code for character-based reads and writes adjust behaviors suitably.

If an execution platform specifies a relationship between abstract-machine objects and storage seen by the CPU, and defines ways by which accesses to certain CPU addresses might trigger side effects the compiler doesn't know about, a quality compiler suitable for low-level programming on that platform should generate code where the behavior of volatile-qualified objects is consistent with that specification. The Standard makes no attempt to mandate that all implementations be suitable for low-level programming (or any other particular purpose, for that matter).

If the address of an automatic variable is never exposed to outside code, a volatile qualifier need only have only two effects:

If setjmp is called within a function, a compiler must do whatever is necessary to ensure that longjmp will not disrupt the values of any volatile-qualified objects, even if they were written between the setjmp and longjmp. Absent the qualifier, the value of objects written between setjmp and longjmp would become indeterminate when a longjmp is executed.
Rules which would allow a compiler to presume that any loops which don't have side effects will run to completion do not apply in cases where a volatile object is accessed within the loop, whether or not an implementation would define any means by which such access would be observable.

Except in those cases, the as-if rule would allow a compiler to implement the volatile qualifier in the abstract machine in a way that has no relation to the physical machine.

Is it legal to optimize away stores/construction of volatile stack variables?

2 Answers2

Linked