Am I right volatile keyword in C needs special hardware support to work?

Question

I understand what volatile does and what it doesn't do, taking the example from this question

void waitForSemaphore()
{
   volatile uint16_t* semPtr = WELL_KNOWN_SEM_ADDR;/*well known address to my semaphore*/
   while ((*semPtr) != IS_OK_FOR_ME_TO_PROCEED);
}

my question is: in the presence of cpu cache, volatile can not guarantee the above works, because it only forces cpu to read smePer from memory but cpu doesn't know if the memory is in RAM or one of the caches. Therefore if another device changed content of WELL_KNOWN_SEM_ADDR, waitForSemaphore won't necessarily know. So there must be something else make it work.

I have read this and this, it seems volatile itself is not enough to guarantee such program works, there must be some platform dependent magic that either by passes L1/2/3 cache or force flush them, am I right? If so are such support available on all popular platforms, for example x86?

how could it do that, volatile is only a instruction for compiler to not cache a variable in cpu register, it doesn't create memory barrier, how can it ensure value read is coherent with RAM? — swang, Jun 03 '15 at 13:42
@user3528438: `volatile` variables in normal RAM are also used for thread/interrupt synchronization. With C11, `stdatomic.h` might be a better choice, Once multiple CPUs are involved, `volatile` is close to useless for that. — too honest for this site, Jun 03 '15 at 14:02
@swang you would need to issue a hardware/cpu specific instruction to do that if the memory model of the hardware doesn't guarantee coherency (e.g. the mfence instruction on x86). But don't do that. Either you're working on a low level processor where you map certain hardware registers to memory (which isn't cachable) and use volatile in your C program, or use mutexes/semaphores or atomic features of newer C/C++ versions, or an existing atomic library for your platform that is well tested. — nos, Jun 03 '15 at 14:04
Possible duplicate: http://stackoverflow.com/questions/12710336/can-compiler-sometimes-cache-variable-declared-as-volatile — Lundin, Jun 03 '15 at 14:13
@nos: That's where `_Atomic` and friends come into play. Imo the most important addition with C11, yet optional. — too honest for this site, Jun 03 '15 at 14:25
@Olaf reordering hadn't occurred to me; original comment deleted and this one added to record that I originally posted a comment containing incorrect information. — Tommy, Jun 03 '15 at 15:59
@nos: Hmm. on larger Irons, you also would have hardware registers. That is no sole feature of "small systems". Whatever "small" means here, even Cortex-M now has more performance and (occasionally) memory than most PCs in the 90ies. — too honest for this site, Jun 03 '15 at 16:10

too honest for this site · Accepted Answer · 2015-06-03T16:14:42.517

4

volatile forbids the compiler to optimize-out accesses to such variables or re-order such accesses with regard to all volatile variables (only!). Actual semantics might be implementation defined (but have to be specified by the compiler).

For the hardware: yes, caches and bus-buffers (e.g. "write buffers") may still reorder, not to speak of the CPU(s). volatile does not imply fences/barriers. So the underlying hardware has to tag such memory areas as "(strongly) ordered" at least. If external hardware is involved, the area must also be tagged "uncached", so every access will directly go to the hardware. That might also apply for other CPUs unless there is some kind of "snooping" hardware in the system (each CPU gets notified on cache-changes in another CPU).

C11 provides stdatomic.h (optional). That would be better suited for thread/interrupt synchronization. If multiple CPUs, volatile is close to useless for that anyway. However, it still has applications for hardware (possible requiring additional mutex or implicit exclusive access to a peripheral device).

edited Jun 03 '15 at 16:14

answered Jun 03 '15 at 13:44

too honest for this site

12,050
4
30
52

If going beyond x86, all kinds of reordering are possible, not just store-store rewrites (which is what I think you've meant under "write buffers"); see https://www.kernel.org/doc/Documentation/memory-barriers.txt for details on possible reorders. – No-Bugs Hare Jun 03 '15 at 13:55
@No-BugsHare: As an embedded developer, I _always_ go beyond x86. Where did I restrict the meaning of re-ordering? – too honest for this site Jun 03 '15 at 14:03
Not only "write buffers" may reorder, as it may be (mis)understood from your reply; there are tons of different reorders possible. I don't insist you meant it, but it can be easily misunderstood this way. – No-Bugs Hare Jun 03 '15 at 14:08
@No-BugsHare: I forgot a closing parenthesis. To clarify even more, I also added now two letters and two dots. Hmm, I might add the CPUs as well; actually I did not exclude them, you wanted to state the non-obvious. But you're right, x86 users might forget the CPU might also reorder. – too honest for this site Jun 03 '15 at 14:14
1

@Jongware: I deleted that sentence and added one at the end of that paragraph. C11 6.7.3#7 leaves some room for ID, that's what I might have confused. Thanks for the hint. – too honest for this site Jun 03 '15 at 14:19
@Olaf: Oh, now I see what you've meant from the very beginning ;-); I'm coming at it from quite a different point of view (large multi-socket boxes, which applies to x86 too), where all the reordering is done by CPUs, but with pretty much the same overall effect ;-). – No-Bugs Hare Jun 03 '15 at 14:20
@No-BugsHare: Embedded is also not a single 8051 MCU anymore;-). ARM Cortex-M and the smaller Cortex-As (5,7) have changed the field very much. With DMA added, I have to be very aware of concurrent memory accesses and their ordering. – too honest for this site Jun 03 '15 at 14:22
On the smaller irons bare-metal, I have the whole control after all. No hidden "magic". Except for "features" not specified in the datasheet - an ever growing problem with the quality of documentation. – too honest for this site Jun 03 '15 at 14:28

score 1 · Answer 2 · answered Jun 03 '15 at 13:42

volatile on its own only tells the compiler "assume that when you store to a volatile object, someone might take notice, so you can't optimise the store away" and "assume that when you read a volatile object, the result might not be the same as what you stored, and also someone might take notice that you read the volatile object, so you can't optimise the read away, and you must use the value read and not what you think should be stored there".

For example, if you write

int x = 0 * (*p);

the compiler can't say "I'll set x to 0 whatever *p is, so I don' need to bother reading *p". It's wrong if *p is volatile.

In C11 there are new features to help you, or you might want to use proper OS functions.

score 0 · Answer 3 · answered Jun 03 '15 at 13:53

No, volatile as such doesn't need hardware support to work. See answer by gnaster729 for details.
Cache synchronization issues are related to so-called "Memory model" (https://en.wikipedia.org/wiki/Memory_ordering ); if describing it very short - there is no such thing as "if another device has changed variable" (it is not observable from reading side), there is only a thing "if another device changed variable a earlier than variable b" - this IS observable. To deal with it, so-called "memory fences" a.k.a. "memory barriers" are used (https://en.wikipedia.org/wiki/Memory_barrier and https://www.kernel.org/doc/Documentation/memory-barriers.txt ), which, if I'm not mistaken, have special support in C++11. On x86, memory fences are implicit on CAS instructions; x86 is widely believed to comply with so-called TSO (total store ordering) memory model, which is quite strict and one of the easiest to deal with (though, as described in kernel.org ref above, it is perfectly feasible to deal with correctness under any memory model).

Am I right volatile keyword in C needs special hardware support to work?

3 Answers3