Understanding the 'volatile' keyword in C++

Question

I am trying to understand how the volatile keyword works in C++.

I had a look at What kinds of optimizations does 'volatile' prevent in C++?. Looking at the accepted answer, it looks like volatile disables two kinds of optimizations

Prevents compilers from caching the value in a register.
Optimizing away accesses to that value when they seem unnecessary from the point of view of your program.

I found similar information at The as-if rule:

Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread.

I wrote a simple C++ program that sums all the values in an array to compare the behaviour of plain ints vs. volatile ints. Note that the partial sums are not volatile.

The array consists of unqualified ints.

int foo(const std::array<int, 4>& input)
{
    auto sum = 0xD;
    for (auto element : input)
    {
        sum += element;
    }
    return sum;
}

The array consists of volatile ints:

int bar(const std::array<volatile int, 4>& input)
{
    auto sum = 0xD;
    for (auto element : input)
    {
        sum += element;
    }
    return sum;
}

When I look at the generated assembly code, SSE registers are used only in the case of plain ints. From what little I understand, the code using SSE registers is neither optimizing away the reads nor reordering them across each other. The loop is unrolled, so there aren't any branches either. The only reason I can explain why the code generation is different is: can the volatile reads be reordered before the accumulation happens? Clearly, sum is not volatile. If such reordering is bad, is there a situation/example that can illustrate the issue?

Code generated using Clang 9:

foo(std::array<int, 4ul> const&):                # @foo(std::array<int, 4ul> const&)
        movdqu  (%rdi), %xmm0
        pshufd  $78, %xmm0, %xmm1       # xmm1 = xmm0[2,3,0,1]
        paddd   %xmm0, %xmm1
        pshufd  $229, %xmm1, %xmm0      # xmm0 = xmm1[1,1,2,3]
        paddd   %xmm1, %xmm0
        movd    %xmm0, %eax
        addl    $13, %eax
        retq
bar(std::array<int volatile, 4ul> const&):               # @bar(std::array<int volatile, 4ul> const&)
        movl    (%rdi), %eax
        addl    4(%rdi), %eax
        addl    8(%rdi), %eax
        movl    12(%rdi), %ecx
        leal    (%rcx,%rax), %eax
        addl    $13, %eax
        retq

The only use cases I've found for `volatile` is when dealing with memory mapped I/O, and when dealing with shared memory between multiple processes. (I'm not claiming my list is exhaustive, just the two situations that I've run into.) — Eljay, Oct 05 '19 at 21:01
Have a read of: _"...This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution,..."_ from: https://en.cppreference.com/w/cpp/language/cv — Richard Critten, Oct 05 '19 at 21:20
I came across some code that uses volatile (for shared memory) and I was completely confused. That's the reason I want to understand the keyword better. — Empty Space, Oct 05 '19 at 21:22
I am not thinking about multiple threads for now. So single thread is fine. — Empty Space, Oct 05 '19 at 21:26
`volatile` has been historically misunderstood and misused. In a future version of the standard most uses of `volatile` might get deprecated: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1152r0.html , https://embeddedartistry.com/newsletter-archive/2019/3/4/march-2019-deprecating-volatile — bolov, Oct 05 '19 at 21:37
Memory-mapped I/O is the issue here, a read at a specific address has side-effects to the hardware. So the SIMD optimization is no longer safe, instead of 4 reads at 4 distinct addresses there is now only 1. — Hans Passant, Oct 05 '19 at 21:39
Doesn't the SIMD read get split into 4 different addresses when the read request is sent on the bus or something like that? — Empty Space, Oct 05 '19 at 21:43
@bolov: Who "misunderstood" `volatile`? If implementations that are designed for system programming treats `volatile` accesses as including whatever memory fences would be needed on the particular platform to implement a reliable mutex, using a "broad-fenced" `volatile` might be less efficient than using more specific memory-barrier intrinsics, but would be portable among all implementations that can be configured to support systems programming with "broad-fenced" volatile semantics. That seems better than requiring that all systems programming use compiler-specific intrinsics. — supercat, Oct 05 '19 at 21:55
As an embedded systems programmer, I have written about my perspective at https://stackoverflow.com/a/49155790/2785528. There I list the 2 most important examples of why we need 'volatile'. — 2785528, Oct 05 '19 at 22:25
@2785528 Thanks for the comment. However, no reads were optimized away or reordered in the illustrated example. — Empty Space, Oct 06 '19 at 04:24
@HansPassant reading https://stackoverflow.com/questions/47512527/simd-intrinsic-and-memory-bus-size-how-cpu-fetches-all-128-256-bits-in-a-singl, the SIMD instruction loads the data from the cache and not directly from the device. — Empty Space, Oct 06 '19 at 05:08
No, addresses mapped to hardware are not cached, for the same reason. — Hans Passant, Oct 06 '19 at 06:05
@TrickorTreat - your welcome. I agree that your code does not need volatile. And since each array element is read only once, there will be no need for any optimization which volatile could prevent. — 2785528, Oct 06 '19 at 23:34
Related: *[Why is volatile needed in C?](https://stackoverflow.com/questions/246127/why-is-volatile-needed-in-c/)* — Peter Mortensen, Jun 19 '23 at 15:02

supercat · Accepted Answer · 2019-10-05T23:10:34.793

6

The volatile keyword in C++ was inherited it from C, where it was intended as a general catch-all to indicate places where a compiler should allow for the possibility that reading or writing an object might have side-effects it doesn't know about. Because the kinds of side-effects that could be induced would vary among different platforms, the Standard leaves the question of what allowances to make up to compiler writers' judgments as to how they should best serve their customers.

Microsoft's compilers for the 8088/8086 and later x86 have for decades been designed to support the practice of using volatile objects to build a mutex which guards "ordinary" objects. As a simple example: if thread 1 does something like:

ordinaryObject = 23;
volatileFlag = 1;
while(volatileFlag)
  doOtherStuffWhileWaiting();
useValue(ordinaryObject);

and thread 2 periodically does something like:

if (volatileFlag)
{
  ordinaryObject++;
  volatileFlag=0;
}

then the accesses to volatileFlag would serve as a warning to Microsoft's compilers that they should refrain from making assumptions about how any preceding actions on any objects would interact with later actions. This pattern has been followed with the volatile qualifiers in other languages like C#.

Unfortunately, neither clang nor gcc includes any option to treat volatile in such a fashion, opting instead to require that programmers use compiler-specific intrinsics to yield the same semantics that Microsoft could achieve using only the Standard keyword volatile that was intended to be suitable for such purposes [according to the authors of the Standard, "A volatile object is also an appropriate model for a variable shared among multiple processes."--see http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf p. 76 ll. 25-26]

edited Oct 05 '19 at 23:10

answered Oct 05 '19 at 21:45

supercat

77,689
9
166
211

`refrain from making assumptions about how any preceding actions on any objects would interact with later actions` Hmm, I thought C++ 11 memory model solved this issue. – Empty Space Oct 05 '19 at 21:58
3

Volatile is essential for things like reading from the same address mapped to IO multiple times in a row. IO devices may be set up to clock in sequential data chunks on each read and w/o volatile a program can assume that reading from the same address not otherwise modified by the program will not change value. It should not be used for multithreading which has related, but not identical semantics. – doug Oct 05 '19 at 22:19
@doug: Can C++11 atomics be fully supported in conforming fashion by freestanding implementations that don't know about any OS that might be used on the target? – supercat Oct 05 '19 at 22:27
@doug: Are you saying that the authors of the Standard were wrong when they wrote the above quote? – supercat Oct 05 '19 at 22:28
You might want to attribute the source of "[according to the authors of the Standard, "A volatile object is also an appropriate model for a variable shared among multiple processes."]". There is no such statement in any of the C++ standards. Bear in mind that a significant amount of older Microsoft documentation (less so with current documentation) described Microsoft-specific features as being standard C++. – Peter Oct 05 '19 at 23:05
@Peter: Citation added. – supercat Oct 05 '19 at 23:10
@supercat. Atomics with mutexes, and volatile have different semantics. Microsoft's compilers have an option that enables/disables functionality that makes them useful for multithreading or narrows the semantics. Volatile remains useful for I/O and voids the assumption, useful in optimization, that multiple reads from the same address will return the same value for either option in the Microsoft compiler. – doug Oct 05 '19 at 23:11
@doug: Microsoft's compilers *predated* the C and C++ Standards, and nothing in the published Rationale for the Standard indicates any intention that implementations for similar platforms shouldn't be expected to continue behaving as existing compilers for such platforms did in the days before the Standard. – supercat Oct 05 '19 at 23:14
2

@supercat, Yes, that isn't the C++11 or greater standard. it's an early C standard and one I wrote a lot of code for at the time. Modern C++ had to address issues due to cache layers and multiple CPUs that earlier C and C++ standards using only volatile were inadequate for without impairing efficiency. A good thing IMO. OTOH, I've never had code that had race condition problems using C++11 or higher and multicore hardware. Volatile alone just doesn't cut it in these environments. At least without impairing performance which the default Microsoft does. – doug Oct 05 '19 at 23:25
@doug: What is the advantage of weakening the meaning of an existing keyword and breaking existing code, versus adding a syntax that would waive some semantic guarantees? – supercat Oct 05 '19 at 23:35
@doug: Also, would the C++ Standard allow freestanding implementations for all platforms to support atomics in an OS-agnostic fashion? From what I can tell, at least the C Standard would require that OS-agnostic freestanding implementations accept code that attempts atomic operations on all types, even though on many environments it would be impossible for an OS-agnostic implementation to do so usefully. – supercat Oct 05 '19 at 23:39
@supercat Existing code? C++ had no concept of multithreading. It did have a concept of how optimization could remove seemingly redundant reads/writes in a single thread of execution. That's what volatile addressed. Multithreading was done outside the C++ specification until C++11 by compiler extensions and OS specific functions. – doug Oct 05 '19 at 23:40
Accepting for the first part of the answer. I am still not convinced about the last part because of the ongoing discussion and also that C standard was referenced and not C++. Although C++ inherits volatile from C, its meaning may have changed post C++ 11. – Empty Space Oct 06 '19 at 06:36
@doug: People have been writing C code which involved interactions between asynchronous execution contexts since before the first C Standard was published, and even before the `volatile` keyword was invented. The C language does not provide any mechanism by which such contexts may be created, but many execution environments do. Prior to the introduction of `volatile`, implementations intended to be suitable for use with such code couldn't do much optimization, but the `volatile` keyword was added as a simple means of making such code compatible with reasonable optimizations. – supercat Oct 06 '19 at 18:29
@doug: In many cases, compilers didn't need `volatile` as an indication that they should behave cautiously around code that used I/O registers, since such code would often access pointers that were freshly cast from integers. Since programs would seldom form pointers from integers except in situations that involved doing something "weird", a compiler that made no effort to infer anything about the provenance of such pointers would naturally handle I/O registers just fine even without `volatile`. – supercat Oct 06 '19 at 18:33
@TrickorTreat: The authors of the C89 Standard didn't think it necessary to specify that implementations intended for various purposes should provide `volatile` semantics suitable for those purpose because it seemed obvious. Adding more detail to later standards would have required a consensus, which was blocked by compiler writers who failed to accept things the C89 Committee had thought obvious enough to go without saying. – supercat Oct 06 '19 at 18:44
@supercat Right. Good history of the way C/C++ had workarounds based on use cases and not pure language specs. The newer language now includes a set of tools for creating efficient, multi-threaded code, and handling mem mapped IO w/o assumptions about what the compiler will do. More portable. OTOH, I've written a lot of code with pre-standard C/C++. The first step was always to understand the compiler and do any workarounds if needed. Not a big problem but one that is no longer needed in most cases. – doug Oct 06 '19 at 19:20
@doug: What I find annoying (and IMHO indefensible) is the way the maintainers of clang and gcc assume the purpose of the Standard was to deprecate pre-standard ways of accomplishing things for which support wasn't mandated, despite the fact that the authors of the Standard have *expressly stated* that they did not intend to demean programs that were useful but happened not to be portable. A compiler with a simple optimizer that was extremely cautious around places where there was evidence of "strangeness" would be much more useful than one with a fancier optimizer that has to be disabled. – supercat Oct 06 '19 at 21:13
@supercat True. There's a lot of old code that takes advantage of compiler smarts. Even though the language didn't specifically support multithreading C++ was still heavily used. People learned what their compilers did and found ways to accomplish goals. I guess it's up to the compiler maintainers to evolve things based on their user's needs. Microsoft's user base has a ton of these older code bases. Probably why their `volatile` has expanded semantics unless one opts for the more limited. GCC Clang, etc. could support this too, and make things easier for porting old bases. Oh well. – doug Oct 06 '19 at 21:40
@doug: The issue isn't generally "taking advantage of compiler smarts", but rather the opposite: taking advantage of compilers' lack of inappropriate "cleverness". A compiler that behaves as though it knows nothing about whether actions one one lvalue might affect another may not generate fast code, but it will reliably produce correct code. BTW, feel free to join discussion at https://chat.stackoverflow.com/rooms/200406/discussion-between-germannerd-and-supercat if you like, – supercat Oct 06 '19 at 22:04

Understanding the 'volatile' keyword in C++

1 Answers1

Linked