What kinds of optimizations does 'volatile' prevent in C++?

Question

I was looking up the keyword volatile and what it's for, and the answer I got was pretty much:

It's used to prevent the compiler from optimizing away code.

There were some examples, such as when polling memory-mapped hardware: without volatile the polling loop would be removed as the compiler might recognize that the condition value is never changed. But since there only were one example or maybe two, it got me thinking: Are there other situations where we need to use volatile in terms of avoiding unwanted optimization? Are condition variables the only place where volatile is needed?

I imagine that optimization is compiler-specific and therefore is not specified in the C++ specification. Does that mean we have to go by gut feeling, saying Hm, I suspect my compiler will do away with this if I don't declare that variable as volatile or are there any clear rules to go by?

Is "don't use `volatile`" an acceptable rule? Because its a pretty good one. Not that `volatile` is never useful. It is, its just that, in general, if you aren't sure if you need it, you probably don't. — Dennis Zickefoose, Aug 30 '10 at 22:12
@DennisZickefoose "_Is "don't use `volatile`" an acceptable rule?_" I would say: do not use `volatile` unless some official standard, reference text or documentation tells you to do so. — curiousguy, Oct 03 '11 at 19:10
`volatile` is not needed for condition variables. condition variables has library support (Win32, pthread, boost, std...) and use full locking with mutex. — v.oddou, Sep 02 '14 at 01:43
Related: *[Why is volatile needed in C?](https://stackoverflow.com/questions/246127/why-is-volatile-needed-in-c/)* — Peter Mortensen, Jun 19 '23 at 15:01

score 27 · Accepted Answer · answered Aug 30 '10 at 22:05

27

Basically, volatile announces that a value might change behind your program's back. That prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of your program.

What should trigger usage of volatile is when a value changes despite the fact that your program hasn't written to it, and when no other memory barriers (like mutexes as used for multi-threaded programs) are present.

answered Aug 30 '10 at 22:05

sbi

219,715
46
258
445

While I think almost all answers to this question are more or less useful (and I would like to have bundled them all up in one to accept, hehe), I would say that this one sums it up. – gablin Sep 01 '10 at 12:10
So under this model, two consecutive writes to a `volatile` can be collapsed into one, right? Because your description only covers optimizations that affects reads. – BeeOnRope Oct 28 '17 at 23:02

GManNickG · Answer 2 · 2011-10-03T18:53:38.063

The observable behavior of a C++ program is determined by read and writes to volatile variables, and any calls to input/output functions.

What this entails is that all reads and writes to volatile variables must happen in the order they appear in code, and they must happen. (If a compiler broke one of those rules, it would be breaking the as-if rule.)

That's all. It's used when you need to indicate that reading or writing a variable is to be seen as an observable effect. (Note, the "C++ and the Perils of Double-Checked Locking" article touches on this quite a bit.)

So to answer the title question, it prevents any optimization that might re-order the evaluation of volatile variables relative to other volatile variables.

That means a compiler that changes:

int x = 2;
volatile int y = 5;
x = 5;
y = 7;

To

int x = 5;
volatile int y = 5;
y = 7;

Is fine, since the value of x is not part of the observable behavior (it's not volatile). What wouldn't be fine is changing the assignment from 5 to an assignment to 7, because that write of 5 is an observable effect.

"`volatile y`" The implicit `int` rule does not exist in C++. — curiousguy, Oct 03 '11 at 18:49

Potatoswatter · Answer 3 · 2010-08-30T23:00:10.513

10

Condition variables are not where volatile is needed; strictly it is only needed in device drivers.

volatile guarantees that reads and writes to the object are not optimized away, or reordered with respect to another volatile. If you are busy-looping on a variable modified by another thread, it should be declared volatile. However, you shouldn't busy-loop. Because the language wasn't really designed for multithreading, this isn't very well supported. For example, the compiler may move a write to a non-volatile variable from after to before the loop, violating the lock. (For indefinite spinloops, this might only happen under C++0x.)

When you call a thread-library function, it acts as a memory fence, and the compiler will assume that any and all values have changed — essentially everything is volatile. This is either specified or tacitly implemented by any threading library to keep the wheels turning smoothly.

C++0x might not have this shortcoming, as it introduces formal multithreading semantics. I'm not really familiar with the changes, but for the sake of backward compatibility, it doesn't require to declare anything volatile that wasn't before.

edited Aug 30 '10 at 23:00

answered Aug 30 '10 at 22:07

Potatoswatter

134,909
25
265
421

2

"Local variables are not allowed to be volatile at all, although some compilers may support it and you can choose to access a local exclusively by volatile *" -> I have seen local volatile variables, in wait loops for embedded systems and soon. Can you please provide a Standard's reference for that? Also be aware that if you have a `int` object, but merely access it by an `int volatile&` (or by dereferencing a `int volatile*`), this is not regarded as a volatile read/write, so the compiler may optimize it away. – Johannes Schaub - litb Aug 30 '10 at 22:46
1

Note that the famous [C++ and the Perils of Double-Checked Locking](http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf) uses casts to `int volatile&` all over the place, and uses one particular quote of the C++03 Standard that reads "The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions". – Johannes Schaub - litb Aug 30 '10 at 22:57
1

It interprets that such that actually the *access path* volatileness is enough to make an access observable, but C++03 also said "The least requirements on a conforming implementation are: [..] At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.". Note that this text is clear that only access to *volatile objects* and not the *access path* alone determine whether the access is observable behavior or not. – Johannes Schaub - litb Aug 30 '10 at 22:59
1

Finally, C++0x is the most clear and just says "Access to volatile objects are evaluated strictly according to the rules of the abstract machine.". It does not try to define the observable behavior multiple times anymore, thanks to [DR #612](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#612). So trying to add observable behavior by using hideous casts actually doesn't work. – Johannes Schaub - litb Aug 30 '10 at 23:02
@Johannes: I can't find it in the standard or by Googling SO with my own name. Removed from the answer. I swear I saw a bulletproof argument… However 12.8/15 seems to imply the existence of such. I'm not sure it's necessary to allow local variables to independently constitute observable behavior. And that article seems to be using "everything" being volatile as a reductio ad absurdum. – Potatoswatter Aug 30 '10 at 23:06
Fair nuff about the casting. What I thought I saw was a rule against *declaring* a volatile variable with automatic storage. – Potatoswatter Aug 30 '10 at 23:08
@Johannes: Ah, I remembered part of it: §1.9/10 requires that local `volatile` objects not be modified except locally. So it's OK to *send* data through a local volatile, but *receiving* any is forbidden. (Indeed, passing a non-const pointer to a local to *any* function is just asking to violate that paragraph!) I guess two threads could attempt to double-lock so long as each lock was split between two locals on either side. – Potatoswatter Sep 06 '10 at 18:54
@Potatoswatter i'm confused. 1.9/10 reads "An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal). ". From what do you conclude that such things violate it? Do you say that `void f(int volatile *a) { *a = 0; } int main() { volatile int a = 0; f(&a); }` is invalid? I'm not following. – Johannes Schaub - litb Sep 06 '10 at 20:15
1

@Johannes: It depends how you define "last-stored value." By definition, the value something contains is the last stored into it… and vice versa. So, if called functions are allowed to store values into the caller's frame (presumably so), your example is valid (with or without `volatile`). However, it's a bit more of a stretch that other threads, the OS, or external devices should be allowed to modify a object — such would render the paragraph meaningless. (Maybe, with its circular logic, it is.) The point, anyway, is that it applies to all locals regardless of `volatile` qualification. – Potatoswatter Sep 06 '10 at 21:49
@Potatoswatter oh i see now. I'm not sure how this is resolved, but I heard someone say that for volatiles, each read can potentially be a store (which is why just reading a volatile is listed as a side-effect). I think the word "store" is not defined by C++ (mybe by some of the technical Standards referenced?). C99 has a footnote for that statement which says "In the case of a volatile object, the last store need not be explicit in the program.". – Johannes Schaub - litb Sep 07 '10 at 05:20
@JohannesSchaub-litb "_Note that this text is clear that only access to volatile objects and not the access path alone determine whether the access is observable behavior or not._" I disagree. What is an object? – curiousguy Oct 03 '11 at 18:54
"_strictly it is only needed in device drivers._" `volatile` is needed for communication with async signal handlers. (Avoid signals for inter-process communication if you can, and do not write async signal handlers unless you really have to.) – curiousguy Oct 03 '11 at 19:05

score 4 · Answer 4 · answered Aug 30 '10 at 22:17

4

Remember that the "as if rule" means that the compiler can, and should, do whatever it wants, as long as the behaviour as seen from outside the program as a whole is the same. In particular, while a variable conceptually names an area in memory, there is no reason why it actually should be in memory.

It could be in a register:

Its value could be calculated away, e.g. in:

int x = 2;
int y = x + 7;
return y + 1;

Need not have an x and y at all, but could just be replaced with:

return 10;

And another example, is that any code that doesn't affect state from the outside could be removed entirely. E.g. if you zeroise sensitive data, the compiler can see this as a wasted exercise ("why are you writing to what won't be read?") and remove it. volatile can be used to stop that happening.

volatile can be thought of as meaning "the state of this variable must be considered part of the outwardly visible state, and not messed with". Optimisations that would use it other than literally following the source code are not allowed.

(A note C#. A lot I've seen of late on volatile suggests that people are reading about C++ volatile and applying it to C#, and reading about it in C# and applying it to C++. Really though, volatile behaves so differently between the two as to not be useful to consider them related).

answered Aug 30 '10 at 22:17

Jon Hanna

110,372
10
146
251

+1, it is important to think on the memory model and the visible state of the program. The compiler could even discard the `volatile` qualifier if it can determine that the variable will not affect the visible behavior. Consider an auto variable declared volatile in a function that does not call any other function. The compiler can determine that the variable cannot be polled outside of the thread and may decide to apply any optimization it wishes. – David Rodríguez - dribeas Aug 30 '10 at 22:35
2

@David : Reads from and writes to volatiles are part of the visible behavior of a C++ program, _by definition_. The optimizer works under the "as-if" rule which allows transformations if they leave the visible behavior unchanged. Therefore, optimizations may not remove reads and writes of volatile objects. – MSalters Aug 31 '10 at 07:36
1

@MSalters, David is right, if an automatic variable is volatile but doesn't have its address passed to a non-automatic volatile pointer or potentially become involved in a long-jump or otherwise accessed outside of "normal" functional access, then there is no way for it to be observed from the outside, as there is no way for anything outside to know what to observe. In this case it could be decided that it isn't really volatile, and its volatility ignored. – Jon Hanna Aug 31 '10 at 09:04
1

@MSalters: I have failed to locate the quote, but I am quite sure that it was a remark from Herb Sutter in the last few months, mentioning that a compiler that could prove that a volatile variable that could be proved not to be visible from another context --i.e. the address is in the stack and it can be proven that it is not passed to other functions and as such cannot be queried from other contexts-- could be proven not to be part of the visible behavior of the program and a conforming compiler could discard the `volatile` qualifier. – David Rodríguez - dribeas Aug 31 '10 at 09:19
@MSalters:... but without a proper quote, take this with a pinch of salt, as I might have misinterpreted the comment in the beginning. – David Rodríguez - dribeas Aug 31 '10 at 09:20
@JonHanna "_if an automatic variable is volatile but doesn't have its address passed_" You could also use debug info to locate the variable. Does this count as having "its address passed"? – curiousguy Oct 02 '11 at 17:40
@curiousguy It would arguably count, but then debug builds change the playing-field as far as optimisation goes anyway. Often a lot of the optimistations that volatile prohibits are applied to all variables in debug builds, so that there's a closer correspondence between source and machine code. In debugging a release build you may find that the variable isn't as volatile as you thought, just like you can see lots of other changes between source and product in release builds. – Jon Hanna Oct 20 '11 at 10:28
@JonHanna "_debug builds change the playing-field as far as optimisation goes anyway_" For some compilers, "debug" and "optimise" are independent. – curiousguy Oct 20 '11 at 13:55
@curiousguy : you could also argue that one could resolve the address of the variable from outside by applying manually offsets from a previously known pointer up the stack. The same way one can access private members of a class from public member's address. However, this is breaking the barrier of the abstract machine. Therefore its somehow "illegal". It can work on a specific compiler though, but you have no guarantee because this supposes a specific implementation of the abstract machine. I don't even think the stack is supposed to be a stack by standard. – v.oddou Sep 02 '14 at 02:01

score 3 · Answer 5 · answered Aug 30 '10 at 22:06

3

Volatile doesn't try to keep data to a cpu register (100's of times faster than memory). It has to read it from memory every time it is used.

answered Aug 30 '10 at 22:06

Byron Whitlock

52,691
28
123
168

`volatile` doesn't exclude the value from L1 cache, which costs only a few cycles to access. It is associated with other mechanisms that do, though. Device registers will always be volatile, and will often be even slower than DRAM. – Potatoswatter Aug 30 '10 at 22:23
@Potatoswatter Isn't the L1 cache is controlled by hardware? I wasn't aware that software could affect anything in there. – Byron Whitlock Aug 30 '10 at 22:29
1

@Potatoswatter: While it is true that there is no real need for a volatile variable to make it all the way to real memory (might depend on the architecture), the fact is that it can have an impact much greater than a few cycles. If the variable is in the same cache line than any variable in use by other CPU, each operation on the `volatile` variable will trigger cache synchronization to the other CPUs and that can be costly both in the `volatile` itself and the non-volatile vars in the same cache line. – David Rodríguez - dribeas Aug 30 '10 at 22:29
@Byron: Hardware configuration settings are set by software. The OS can call up the MMU and turn off caching for given pages. There might even be a user-space facility to let any program do so. – Potatoswatter Aug 30 '10 at 22:41
@David: Yes, but that also applies to non-volatile variables that didn't happen to be subject to optimization. – Potatoswatter Aug 30 '10 at 22:42
@Potatoswatter: I agree, but the thing is that multiple reads of the same non-volatile variable inside the same function *can* be optimized into a single read into a register. In that scenario, the `volatile` keyword might trigger many synchs that in the non-volatile case would not be performed. That is, it will not just turn a register operation into a L1 cache read, but can cascade and have a much greater impact. – David Rodríguez - dribeas Aug 30 '10 at 23:03
Byron, as a rule of thumb you can consider register access 1, L1 goes through load/store buffer (about 4 cycles), L2 has about 12 cycles, and L3 is 15. memory is decoupled from CPU cyles, and since about 10 years back has kept hovering around 120ns of access times. More info: http://www.sisoftware.net/?d=qa&f=ben_mem_latency – v.oddou Sep 02 '14 at 02:07

score 1 · Answer 6 · answered Aug 30 '10 at 22:53

One way to think about a volatile variable is to imagine that it's a virtual property; writes and even reads may do things compiler can't know about. The actual generated code for a writing/reading a volatile variable is simply a memory write or read(*), but the compiler has to regard the code as opaque; it can't make any assumptions under which it might be superfluous. The issue isn't merely with making sure that the compiled code notices that something has caused a variable to change. On some systems, even memory reads can "do" things.

(*) On some compilers, volatile variables may be added to, subtracted from, incremented, decremented, etc. as distinct operations. It's probably useful for a compiler to compile:

  volatilevar++;

as

  inc [_volatilevar]

since the latter form may be atomic on many microprocessors (though not on modern multi-core PCs). It's important to note, however, that if the statement were:

  volatilevar2 = (volatilevar1++);

the correct code would not be:

  mov ax,[_volatilevar1] ; Reads it once
  inc [_volatilevar]     ; Reads it again (oops)
  mov [_volatilevar2],ax

nor

  mov ax,[_volatilevar1]
  mov [_volatilevar2],ax ; Writes in wrong sequence
  inc ax
  mov [_volatilevar1],ax

but rather

  mov ax,[_volatilevar1]
  mov bx,ax
  inc ax
  mov [_volatilevar1],ax
  mov [_volatilevar2],bx

Writing the source code differently would allow the generation of more efficient (and possibly safer) code. If 'volatilevar1' didn't mind being read twice and 'volatilevar2' didn't mind being written before volatilevar1, then splitting the statement into

  volatilevar2 = volatilevar1;
  volatilevar1++;

would allow for faster, and possibly safer, code.

Sorry, but I can't find a justification for your order claim about `volatilevar2 = (volatilevar1++);`. There is only a single sequence point, at the `;`. Therefore the order in which the writes happens is not guaranteed. — MSalters, Aug 31 '10 at 07:32
@MSalters: You may be right on that point, in which case the third variation would be acceptable. On the other hand, the fastest version, which uses inc [_volatilevar1], is certainly not acceptable despite the fact that there are cases where it would be less trouble-prone than the longer versions. — supercat, Aug 31 '10 at 13:41

irreputable · Answer 7 · 2011-10-02T18:02:41.580

0

usually compiler assumes that a program is single threaded, therefore it has complete knowledge of what's happening with variable values. a smart compiler can then prove that the program can be transformed into another program with equivalent semantics but better performance. for example

x = y+y+y+y+y;

can be transformed to

x = y*5;

however, if a variable can be changed outside the thread, compiler doesn't have a complete knowledge of what's going on by simply examining this piece of code. it can no longer make optimizations like above. (edit: it probably can in this case; we need more sophisticated examples)

by default, for performance optimization, single thread access is assumed. this assumption is usually true. unless programmer explicitly instruct otherwise with the volatile keyword.

edited Oct 02 '11 at 18:02

answered Aug 30 '10 at 22:13

irreputable

44,725
9
65
93

Actually, I believe constant folding is still valid on `volatile` items, which is essentially what you've shown here. – Billy ONeal Aug 30 '10 at 22:15
1

I'm not c++ expert. in java, the volatile y must be fetched 5 times. – irreputable Aug 30 '10 at 22:17
2

@Billy: I agree the answer is a bit unclear, but to clarify: If `y` is volatile, then changing `x = y + y` into `x = 2 * y` is *not* okay. But changing `y = 2 + 2` to `y = 4` is fine. – GManNickG Aug 30 '10 at 22:19
3

@Billy ONeal: If the code reads a volatile variable five times, the variable must be read five times precisely. A statement: "a=(b && volatilevar);" may not be written as "a = (volatilevar & -!!b);" even if the latter form would otherwise be faster (since there's no branching), since the latter form reads volatilevar even when b is false. – supercat Aug 30 '10 at 22:36
1

The `volatile` keyword was not added to the language to take multithreading into account. Even if the meaning is somewhat similar, the intention is accessing hardware components through memory addresses. `volatile` means that the value of the variable can change outside of the program --not just the thread, but the whole program-- or even that the read can have side effects outside of what the program does --i.e. imagine a hardware counter that increments on each read. – David Rodríguez - dribeas Aug 30 '10 at 22:52
"_it can no longer make optimizations like above_" That's wrong. The optimisation is perfectly valid. – curiousguy Oct 02 '11 at 17:48
1

To elaborate on @curiousguy's comment, coalescing 5 reads of y into a single read is totally allowed if y is a regular int. It's *still* allowed if the reads are atomic, because 5 consecutive reads with nothing interleaved in between them is a valid execution of a multithreaded program. But it's *not* allowed if the reads are volatile, because volatile reads might have observable side effects that the compiler has to preserve. – Jack O'Connor Oct 20 '22 at 22:09
@JackO'Connor Exactly. A nice way to describe volatile access is that each access in a separate line/statement is a possible breakpoint where you can cheat with `ptrace` and using `ptrace` to change volatile objects has the exact same effect as (dynamically) changing source code to change those same objects. And as much as the system supports breakpoints, the compiler must generate code that tolerates such uses of `ptrace`, but other uses of `ptrace` have UB. – curiousguy Nov 09 '22 at 00:46

score 0 · Answer 8 · answered Aug 30 '10 at 22:13

Unless you are on an embedded system, or you are writing hardware drivers where memory mapping is used as the means of communication, you should never ever ever be using volatile

Consider:

int main()
{
    volatile int SomeHardwareMemory; //This is a platform specific INT location. 
    for(int idx=0; idx < 56; ++idx)
    {
        printf("%d", SomeHardwareMemory);
    }
}

Has to produce code like:

loadIntoRegister3 56
loadIntoRegister2 "%d"
loopTop:
loadIntoRegister1 <<SOMEHARDWAREMEMORY>
pushRegister2
pushRegister1
call printf
decrementRegister3
ifRegister3LessThan 56 goto loopTop

whereas without volatile it could be:

loadIntoRegister3 56
loadIntoRegister2 "%d"
loadIntoRegister1 <<SOMEHARDWAREMEMORY>
loopTop:
pushRegister2
pushRegister1
call printf
decrementRegister3
ifRegister3LessThan 56 goto loopTop

The assumption about volatile is that the memory location of the variable may be changed. You are forcing the compiler to load the actual value from memory each time the variable is used; and you tell the compiler that reuse of that value in a register is not allowed.

What kinds of optimizations does 'volatile' prevent in C++?

8 Answers8

Linked