Is `volatile` required for shared memory accessed via access function?

Question

[edit] For background reading, and to be clear, this is what I am talking about: Introduction to the volatile keyword

When reviewing embedded systems code, one of the most common errors I see is the omission of volatile for thread/interrupt shared data. However my question is whether it is 'safe' not to use volatile when a variable is accessed via an access function or member function?

A simple example; in the following code...

volatile bool flag = false ;
void ThreadA()
{
    ...
    while (!flag)
    {
        // Wait
    }
    ...
}

interrupt void InterruptB()
{
    flag = true ;
}

... the variable flag must be volatile to ensure that the read in ThreadA is not optimised out, however if the flag were read via a function thus...

volatile bool flag = false ;
bool ReadFlag() { return flag }
void ThreadA()
{
    ...
    while ( !ReadFlag() )
    {
        // Wait
    }
    ...
}

... does flag still need to be volatile? I realise that there is no harm in it being volatile, but my concern is for when it is omitted and the omission is not spotted; will this be safe?

The above example is trivial; in the real case (and the reason for my asking), I have a class library that wraps an RTOS such that there is an abstract class cTask that task objects are derived from. Such "active" objects typically have member functions that access data than may be modified in the object's task context but accessed from other contexts; is it critical then that such data is declared volatile?

I am really interested in what is guaranteed about such data rather than what a practical compiler might do. I may test a number of compilers and find that they never optimise out a read through an accessor, but then one day find a compiler or a compiler setting that makes this assumption untrue. I could imagine for example that if the function were in-lined, such an optimisation would be trivial for a compiler because it would be no different than a direct read.

I would guess that in the case the function is inline, it will hurt. I'd appreciate a definitive answer though. — Matthieu M., Jun 30 '10 at 11:06

score 14 · Accepted Answer · answered Jun 30 '10 at 11:31

14

My reading of C99 is that unless you specify volatile, how and when the variable is actually accessed is implementation defined. If you specify volatile qualifier then code must work according to the rules of an abstract machine.

Relevant parts in the standard are: 6.7.3 Type qualifiers (volatile description) and 5.1.2.3 Program execution (the abstract machine definition).

For some time now I know that many compilers actually have heuristics to detect cases when a variable should be reread again and when it is okay to use a cached copy. Volatile makes it clear to the compiler that every access to the variable should be actually an access to the memory. Without volatile it seems compiler is free to never reread the variable.

And BTW wrapping the access in a function doesn't change that since a function even without inline might be still inlined by the compiler within the current compilation unit.

P.S. For C++ probably it is worth checking the C89 which the former is based on. I do not have the C89 at hand.

answered Jun 30 '10 at 11:31

Dummy00001

16,630
5
41
63

actually the CORRECT way to deal with the synchronization issues is with locks, volatile does NOT make this safe since on systems like some multi-processor arm systems this will STILL break because the caches aren't coherent... – Spudd86 Jun 30 '10 at 14:33
Thanks, in the normal case the 'data owner' task and the 'data accessing' task would be in separate compilation units, but I do not intend to rely on that, nor that 'linker optimisation' might not have the same effect. – Clifford Jun 30 '10 at 15:05
@spudd86: Synchronisation is a different issue (I probably should not have mentioned non-atomic access in the preamble); I am only concerned with the compiler optimising out a read. Also in my case we can assume that targeting multi-processor devices is not required. Currently the target is a Cortex-M3. – Clifford Jun 30 '10 at 15:12
To prevent the function from being inlined, use the 'volatile' keyword. As I understand it, volatile int foo (void) { return 32; } will not get inlined (at least with any compiler I've tried). It can also be applied to statements. :) Fun stuff. – Sparky Jun 30 '10 at 23:25
@Sparky "_To prevent the function from being inlined, use the 'volatile' keyword._" Hug? There is no such guaranty. And why would you want do prevent function inlining? – curiousguy Oct 31 '11 at 20:57
@Sparky [Putting volatile on a function does nothing](http://stackoverflow.com/a/15283620/2894252). According to David Rodríguez, "the compiler is free to drop the `volatile` qualifier." You could use __attribute__ ((noinline)). However like curiousguy said, why would you want to. I don't know of any guarantee that the compiler can't optimize it even if it isn't inlined. – Andy Stangeland Dec 11 '14 at 03:42

score 5 · Answer 2 · answered Jun 30 '10 at 11:30

Yes it is critical.
Like you said volatile prevents code breaking optimization on shared memory [C++98 7.1.5p8].
Since you never know what kind of optimization a given compiler may do now or in the future, you should explicitly specify that your variable is volatile.

score 1 · Answer 3 · answered Jun 30 '10 at 11:17

Of course, in the second example, writing/modifying variable 'flag' is omitted. If it is never written to, there is no need for it being volatile.

Concerning the main question

The variable has still to be flagged volatile even if every thread accesses/modifies it through the same function.

A function can be "active" simultaneously in several threads. Imagine that the function code is just a blueprint that gets taken by a thread and executed. If thread B interrupts the execution of ReadFlag in thread A, it simply executes a different copy of ReadFlag (with a different context, e.g. a different stack, different register contents). And by doing so, it could mess up the execution of ReadFlag in thread A.

`flag` is written to by the interrupt; the second code example was intended as a fragment and to only show the changes, the interrupt handler remains the same. Sorry if that was not clear. — Clifford, Jun 30 '10 at 13:56

score 1 · Answer 4 · answered Jun 30 '10 at 16:18

In C, the volatile keyword is not required here (in the general sense).

From the ANSI C spec (C89), section A8.2 "Type Specifiers":

There are no implementation-independent semantics for volatile objects.

Kernighan and Ritchie comment on this section (referring to the const and volatile specifiers):

Except that it should diagnose explicit attempts to change const objects, a compiler may ignore these qualifiers.

Given these details, you can't be guaranteed how a particular compiler interprets the volatile keyword, or if it ignores it altogether. A keyword that is completely implementation dependent shouldn't be considered "required" in any situation.

That being said, K&R also state that:

The purpose of volatile is to force an implementation to suppress optimization that could otherwise occur.

In practice, this is how practically every compiler I have seen interprets volatile. Declare a variable as volatile and the compiler will not attempt to optimize accesses to it in any way.

Most of the time, modern compilers are pretty good about judging whether or not a variable can be safely cached or not. If you find that your particular compiler is optimizing away something that it shouldn't, then adding a volatile keyword might be appropriate. Be aware, though, that this can limit the amount of optimization that the compiler can do on the rest of the code in the function that uses the volatile variable. Some compilers are better about this than others; one embedded C compiler I used would turn off all optimizations for a function that accesses a volatile, but others like gcc seem to be able to still perform some limited optimizations.

Accessing the variable through an accessor function should prevent the function from caching the value. Even if the function is auto-inlined, each call to the function should re-call the function and re-fetch a new value. I have never seen a compiler that would auto-inline the accessor function and then optimize away the data re-fetch. I'm not saying it can't happen (since this is implementation-dependent behavior), but I wouldn't write any code that expects that to happen. Your second example is essentially placing a wrapper API around the variable, and libraries do this without using volatile all the time.

All in all, the treatment of volatile objects in C is implementation-dependent. There is nothing "guaranteed" about them according to the ANSI C89 spec.

Your code is sharing the volatile object between a thread and an interrupt routine. No compiler implementation (that I have ever seen) gives volatile enough power to be sufficient for handling parallel access. You should use some sort of locking mechanism to guarantee that the two threads (in your first example) don't step on each other's toes (even though one is an interrupt handler, you can still have parallel access on a multi-CPU or multi-core system).

Adding to the last paragraph: Make sure that whatever locking mechanism you use is safe to use inside interrupt context (i.e. accessing it shouldn't be a blocking operation). — bta, Jun 30 '10 at 16:19
You shouldn't need locking for atomic accesses. In fact, locking is usually implemented using some form of atomic access. Atomics might require assembly code though. — Zan Lynx, Oct 24 '11 at 07:32
"_Except that it should diagnose explicit attempts to change const objects, a compiler may ignore these qualifiers._" This is just wrong. — curiousguy, Oct 31 '11 at 20:58
@curiousguy- care to elaborate? That's a direct quote from K&R, so it should be correct (at least as far as ANSI C goes, things may have changed in C99). — bta, Nov 01 '11 at 03:41
@curiousguy: While a compiler would be allowed to specify `volatile` semantics that would cause weird things to happen if a volatile-qualified pointer were used to access an "ordinary" object, a compiler that refrains from doing anything weird with such accesses and unconditionally outputs at least one diagnostic would be allowed to treat all objects as non-const volatile without regard for the presence or absence of such qualifiers. — supercat, Aug 16 '21 at 17:03

Spudd86 · Answer 5 · 2010-06-30T15:26:45.890

0

Edit: I didn't read the code very closely and so I thought this was a question about thread synchronization, for which volatile should never be used, however this usage looks like it might be OK (Depending on how else the variable in question is used, and if the interrupt is always running such that it's view of memory is (cache-)coherent with the one that the thread sees. In the case of 'can you remove the volatile qualifier if you wrap it in a function call?' the accepted answer is correct, you cannot. I'm going to leave my original answer because it's important for people reading this question to know that volatile is almost useless outside of certain very special cases.

More Edit: Your RTOS use case may require additional protection above and beyond volatile, you may need to use memory barriers in some cases or make them atomic... I can't really tell you for sure, it's just something you need to be careful of (I'd suggest looking at the Linux kernel documentation link I have below though, Linux doesn't use volatile for that kind of thing, very probably with a good reason). Part of when you do and do not need volatile depends very strongly on the memory model of the CPU you're running on, and often volatile is not good enough.

volatile is the WRONG way to do this, it does NOT guarantee that this code will work, it wasn't meant for this kind of use.

volatile was intended for reading/writing to memory mapped device registers, and as such it is sufficient for that purpose, however it DOES NOT help when you're talking about stuff going between threads. (In particular the compiler is still aloud to re-order some reads and writes, as is the CPU while it's executing (this one's REALLY important since volatile doesn't tell the CPU to do anything special (sometimes it means bypass cache, but that's compiler/CPU dependent))

see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html, Intel developer article, CERT, Linux kernel documentation

Short version of those articles, volatile used the way you want to is both BAD and WRONG. Bad because it will make your code slower, wrong because it doesn't actually do what you want.

In practice, on x86 your code will function correctly with or without volatile, however it will be non-portable.

EDIT: Note to self actually read the code... this is the sort of thing volatile is meant to do.

edited Jun 30 '10 at 15:26

answered Jun 30 '10 at 14:49

Spudd86

2,986
22
20

In my case cache coherency etc. are not an issue. The target is a Cortex-M3. I do not intend to weigh down the code with mechanisms to make the code portable to targets it will never run on. I applaud your letting this stand despite addressing a different issue, because the issue is important. – Clifford Jun 30 '10 at 15:20
volatile can actually be exactly what you want when programming for small embedded devices like 8bit MCUs. – ziggystar Jul 02 '10 at 07:56
@Clifford: On the Cortex M3, a `volatile` qualifier will usually be sufficient but there are a few situations where it may be necessary to use "barrier" directives defined in CMSIS. Most such situations would be unlikely to arise in C, but some theoretically could. For example, if `foo()` is a function in an area of address psace that supports remapping and an instruction which changes the mapping is immedately followed by a call to `foo()`, the processor might manage to pre-fetch the first instruction word at `foo` before it processes the write which changed the mapping. – supercat Feb 26 '17 at 17:25
@Clifford: CMSIS defines a directive (I think it's `__isb()`) which could be placed between the store to the mapping-control register and the code fetch which would force the hardware to stop pre-fetching instruction words until all previous operations had completed. The m3 pipeline is short enough that one would generally have to try pretty hard to get into trouble. I don't remember if it's ever actually possible for the m3 to pre-fetch the target of an unconditional B or BL before a preceding store completes, but the barrier directive would ensure that such a thing wouldn't happen. – supercat Feb 26 '17 at 17:32
@supercat : Thanks, all good points - but seven years after the question was asked! ;-) – Clifford Feb 27 '17 at 11:45
@Clifford: People are still using the Cortex M3. A lot of code (including, no doubt, some of mine) doesn't use such directives in all the places they "should" be used, but the pipeline on the Cortex M3 is short enough it's not usually a big deal. I forgot to mention another important case, though: if one executes a store immediately followed by a WFI instruction, the CPU might go to sleep before the store executes, which could be disastrous if the effect of the store was supposed to set in motion the events necessary to cause the wake-up. – supercat Feb 27 '17 at 15:14
@supercat : Yes, me included; I am just not sure it was worth digging up a question this old. Or this particular answer warranted the comment. – Clifford Feb 27 '17 at 21:35
@supercat ISB is mostly if you've changed the program, i.e. the instructions, you want at minimum a DMB (data memory barrier) or perhaps a DSB (data synchronisation barrier) for data orientated things. – Flip May 14 '21 at 11:30
@Flip: I think ISB may be "stronger" than needed for many purposes, but except in cases where it would impose an excessive performance penalty, I would think it easier to have a "use this if any kind of barrier is required" than to worry about the details of what sort of finer-grained barriers would suffice. – supercat May 14 '21 at 19:05
@supercat ISB is for "Instruction Synchronisation Barrier" and only affect the executing processor. Used for discarding prefetched instructions. DSB is the one that will ensure all stores are completed and visible before continuing. DSB will ensure that all cache, TLB and branch prediction is completed and that no further instructions (after the DSB) are executed before completing. DMB is far weaker and prevents ensures the order of reads and writes. Most common would be DMB ISH. So, the "if in doubt barrier" would be DSB. – Flip Jun 01 '21 at 17:43
@Flip: I've not used multi-core ARM processors before, so I'd not considered the question of inter-core synchronization. It does sound like DSB would be more semantically appropriate, but I find it irksome that the Standard doesn't require that all implementations provide a means of blocking compiler reordering, along with a standard means of requesting hardware-level atomic operations on platforms that can have hardware-level support for some but not all operations in the Standard atomic library. If, for example, one were targeting a 16-bit single-core x86 microcontroller... – supercat Jun 01 '21 at 17:53
...a function to directly perform an atomic decrement-and-check-if-became-zero would be more useful than an implementation which supports all atomic operations by using a mutex (which would make it necessary to use the same mutex to guard all atomic operations, including those the hardware could otherwise have supported directly without a mutex). – supercat Jun 01 '21 at 17:56
@supercat There are two sources for reordering, one being the compiler and the other being the processor itself. The `DMB`, `DSB` instructions are for the processor only. Though using the appropriate inline assembly (with the `volatile` keyword to prevent compiler reordering), to call the `DMB` or `DSB` instruction, you can have a memory reordering barrier for both. – Flip Jun 28 '21 at 08:13
@Flip: Unfortunately, when the Standard characterized the behavior of `volatile` as implementation-behaved, it failed to specify that implementations intended for various purposes should make its semantics strong enough for those purposes without, requiring additional non-standard syntax, but the semantics of gcc and clang are too weak to be usable for single-core interrupt-driven systems. – supercat Jun 28 '21 at 14:48
@Flip: I see no reason a compiler claiming to be suitable for low-level programming shouldn't offer options to make the semantics of `volatile` strong enough to implement a hand-off mutex (once the mainline gives control of the mutex to a background I/O ISR or task it can't use it again until the ISR/task hands it back) without implementation-specific constructs, either with compiler barriers only (for applications where both tasks would be on a single core or cache-coherent cores) or configurable hardware barriers as well. – supercat Jun 28 '21 at 15:40
@supercat `volatile` adding mutexes!? In interrupts!!? You don't sound like an embedded programmer. `volatile` is purely there for the compiler to not optimise out load and stores of a variable (and reordering). Embedded still have to deal with cache systems never mind the synchronisation instructions which is beyond the scope of the compiler. Never mind that some systems don't support ldrex/strex mutex implementations. (OCM in Zynq 7 000) and require ...other solutions... – Flip Jun 29 '21 at 09:58
@Flip: The term "mutex" doesn't, imply something that each side will have to actively wait for. It's also possible to have a mutex where one or both sides will check for immediate availability, and if a mutex isn't available, do something else. Such a model can often be appropriate for things like timer-based bit-bang UARTs on a system where a timer tick interrupt has multiple functions to perform. If an interrupt, when not actively sending a byte, says `if (modemBytesToSend) { nextModemByte = *modemBytePtr++; modemBytesToSend--;}`, that should constitute half of a hand-off mutex... – supercat Jun 29 '21 at 14:48
...guarding `modemBytePtr` and `*modemBytePtr`, allowing main-line code to safely write to those things whenever `modemBytesToSend` is zero. If the main line sees that `modemBytesToSend` is zero, it's entitled to assume it has control of the guarded resources (it "acquires" them) until it stores a non-zero value to `modemBytesToSend` ("releasing" the resources). A full mutex would have additional features and complications, allowing for the possibility that the interrupt could asynchronously take control of the resources, rather than only being able to receive control as a result of... – supercat Jun 29 '21 at 14:53
...being explicitly handed such control by the main-line, but unless you want to describe some other construct more primitive than a mutex I don't know what you'd call the interlocked relationship between the main-line's and interrupt use of `modemBytePtr` and `*modemBytePtr`. Would you prefer a different term? – supercat Jun 29 '21 at 14:55
@Flip: To put it another way, I'm not expecting `volatile` to *add* a mutex, but rather to allow semantics sufficiently strong to allow a programmer to *build* a mutex that can guard *unqualified* objects. – supercat Jun 29 '21 at 17:29
1

@supercat you'll be disappointed then, it doesn't allow you to do that. You have to use fences/atomics to build a mutex. – Spudd86 Jul 07 '21 at 15:23
@Spudd86: On a machine where all cores are cache-coherent (including all machines which only have a single core), all that is necessary to make Dekker's Algorithm and Peterson's Algorithm work is that compilers refrain from reordering instruction that access guarded resources across operations on the flags used to guard them. Use of hardware-level atomic primitives like test-and-set when support was available could improve efficiency even on cache-coherent systems, but the whole point of Dekker/Peterson algorithms was they could synthesize a mutex even without hardware atomic support. – supercat Jul 07 '21 at 15:41
@Spudd86: The authors of the C Standard wanted to avoid forbidding optimizations *that wouldn't interfere with what programmers needed to do*, and saw no need to say that compiler writers should refrain from attempting optimizations that would interfere with what their customers needed to do, since they thought anyone hoping to sell compilers would regard that as obvious. Compiler writers were expected to know more about their customers' needs than the Committee ever could, and were thus granted considerable discretion as to how to best meet those needs. – supercat Jul 07 '21 at 15:48
@Spudd86: Besides, on most freestanding implementations all non-trivial programs will need to perform actions characterized by the Standard as UB (such as using pointers to access hardware registers that aren't "objects", which the Standard characterizes as UB even if the pointers' target type is qualified `volatile`). Freestanding C implementations would be completely useless if they couldn't be relied upon to behave meaningfully in more cases than mandated by the Standard. – supercat Jul 07 '21 at 22:49
1

@supercat even on most freestanding implementations volatile doesn't do fencing/atomics, it's an uncached access which is *not* the same thing and provides no sync between threads and is in no way sufficient to build a mutex on a multi-core machine. That is just *not* what volatile means. Dekker's Algorithm *does not work* on modern multi-core machines without fences and/or atomic operations. The memory model allows interleavings that break it. That is *why* C11/C++11 added fences, atomics, and defined a memory model. – Spudd86 Aug 16 '21 at 16:21
@Spudd86: I would generally expect a freestanding implementation on a multi-core machine to process `volatile` in a manner which is agnostic to inter-core caching issues, and thus require that the programmer arrange things so that all threads which access objects guarded by volatile-alone are cache coherent (e.g. by ensuring that all such threads are run on a single core, or a single group of cache-coherent cores, or by using volatile accesses to cache-control registers to force things to be flushed when needed). None of those things should require any *compiler*-specific syntax, however. – supercat Aug 16 '21 at 16:38
@Spudd86: Further, if a compiler will be used to process code that runs in a privileged context but accesses storage owned by an unprivileged context, it should be configured (and would thus have to be configurable) in such a way that would limit the effects of race conditions--most likely specifying that a read that conflicts with writes may yield any value which the storage in question has held or will hold between applicable sequencing barriers, but have no other side effect. Treating such reads as UB would make it impossible to make such code robust against privilege escallation. – supercat Aug 16 '21 at 16:44

Is `volatile` required for shared memory accessed via access function?

5 Answers5

Linked