13

From what I know, the compiler never optimizes a variable that is declared as volatile. However, I have an array declared like this.

volatile long array[8];

And different threads read and write to it. An element of the array is only modified by one of the threads and read by any other thread. However, in certain situations I've noticed that even if I modify an element from a thread, the thread reading it does not notice the change. It keeps on reading the same old value, as if compiler has cached it somewhere. But compiler in principal should not cache a volatile variable, right? So how come this is happening.

NOTE: I am not using volatile for thread synchronization, so please stop giving me answers such as use a lock or an atomic variable. I know the difference between volatile, atomic variables and mutexes. Also note that the architecture is x86 which has proactive cache coherence. Also I read the variable for long enough after it is supposedly modified by the other thread. Even after a long time, the reading thread can't see the modified value.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
pythonic
  • 20,589
  • 43
  • 136
  • 219
  • I have a feeling this has something to do with the difference between `volatile long[]` and `long[] volatile` – Earlz Oct 03 '12 at 14:12
  • Most likely because of details that you've left out of your question. Can you include an example program that does not do what you want it to? –  Oct 03 '12 at 14:12
  • 3
    volatile is not for threads. You need to use a mutex. – Vaughn Cato Oct 03 '12 at 14:16
  • My answer at least was addressing - "However, in certain situations I've noticed that even if I modify an element from a thread, the thread reading it does not notice the change. It keeps on reading the same old value, as if compiler has cached it somewhere. But compiler in principal should not cache a volatile variable, right? So how come this is happening.". What's the problem with that? – Luchian Grigore Oct 03 '12 at 14:21
  • 4
    AFAIK in C++ `volatile` only affects the compiler optimizations, not the possible CPU reorders that can still happen. – Tudor Oct 03 '12 at 14:22
  • 1
    Even if you don't care about synchronization, using a mutex also creates a memory barrier, which is necessary for data shared between multiple CPUs. See http://stackoverflow.com/questions/1616093/does-presence-of-mutex-help-getting-rid-of-volatile-key-word?rq=1 – Vaughn Cato Oct 03 '12 at 14:23
  • You are observing that a value changed within the same thread you are reading it from isn't seeing the changed value? This would be wrongly generated code by the compiler even if there is no mention of `volatile` and you should file a bug report with the compiler vendor. I suspect you change the value in ine thread and read it in another thread: This does not work without some form of synchronization. – Dietmar Kühl Oct 03 '12 at 14:33
  • If you are using visual c++ then it should work as volatile is extended on that platform to also be a memory barrier as a non standard extension. I would look out for some other error in your code. Are you certain the value is being set? And that it's not being changed back before the other thread reads it? – jcoder Oct 03 '12 at 14:36
  • 3
    Your note doesn't affect my answer, btw, and probably not other people's answers. You might not think you're using `volatile` for thread synchronization, but if you expect it to introduce a relationship between a read and a write in different threads, then in point of fact you are, because by definition that's what thread synchronization *is*. – Steve Jessop Oct 03 '12 at 14:40
  • 1
    @SteveJessop: You write “if you expect it to introduce a relationship…”, but the question does not indicate such an expectation. It states one thread writes and others read, but it does not state there is any expectation about when the readers will see updates. In any case, this is irrelevant; what any person expects will happen with regard to thread synchronization is unrelated to the reported observations that the behavior of compiled code is as if the value of a volatile object were not read from memory when it should have been. – Eric Postpischil Oct 03 '12 at 14:47
  • 2
    @Eric: it explicitly states an expectation about when the readers will see updates: "I read the variable for long enough after is is supposedly modified by the other thread". Assuming the test code is doing what it's intended to, this is an argument between user1018562 and his implementation, over how long is "long enough". He says there's a limit, the implementation apparently says there isn't. The standard will not intervene in that argument, it has nothing to say about whether the compiler "in principal should not cache a volatile variable", because the code has a data race. – Steve Jessop Oct 03 '12 at 14:59
  • 2
    I would try a mutex. If that fixes it, then you can expect it is a cache issue. If that doesn't fix it, then you can look for the problem elsewhere. – Vaughn Cato Oct 03 '12 at 15:11
  • @Eric Postpischil: If there is no expectation that the changed value will be seen between threads, how can the user know that the change will become visible at some point in the future and he just didn't wait long enough? – Dietmar Kühl Oct 03 '12 at 15:13
  • 1
    @DietmarKühl: There could be other mechanisms involved. Or the updates could just be opportunistic: If updates are seen, great, some other function triggers and data is processed. If not, it happens later in the writing thread, after the thread is done producing data. Maybe it is known the hardware will propagate updates within several seconds. Maybe the writer issues a fence rarely. Who knows. It does not matter, because none of it is relevant to the question. At most, there is a concern here that maybe the reporting of observations is incorrect, so we ought to look into that. – Eric Postpischil Oct 03 '12 at 15:53
  • 1
    The actual question is this: Does the assembly language generated by the compiler read the volatile object when the abstract machine for this C program reads the object? That might be answered by examining the behavior of the real machine, but another approach is to examine the assembly language. The questioner, user1018562, ought to show code that demonstrates the problem and the corresponding assembly language. Otherwise, there is insufficient data to answer this question. – Eric Postpischil Oct 03 '12 at 16:07
  • Eric-Yes that is what I'm gonna do, look at the assembly code. – pythonic Oct 03 '12 at 17:20
  • You wrote “different threads read and write to it” and “NOTE: I am not using volatile for thread synchronization”. Sorry, but having different threads read and write to the same memory location is called thread synchronization (the very thing that `volatile` alone usually does not work for). – Pascal Cuoq Oct 03 '12 at 20:49
  • 1
    In a way, yes, it can cache volatiles. See [this question](http://stackoverflow.com/questions/12666916/assignment-expressions-and-volatile). – Alexey Frunze Oct 04 '12 at 00:23
  • Related: [If I don't use fences, how long could it take a core to see another core's writes?](https://stackoverflow.com/q/51292687) (x86 asm, also applies to C++ `volatile` or `atomic` stores). Using a `volatile` written by one thread and read by another is data-race UB, but specific implementations might define `volatile` strongly enough that you can say what will happen. e.g. no tearing for naturally-atomic variables: [Why is integer assignment on a naturally aligned variable atomic on x86?](https://stackoverflow.com/q/36624881). – Peter Cordes Jul 29 '18 at 22:24
  • This is how threading was done before C++11 introduced a memory model. Also related: [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/q/387181). – Peter Cordes Jul 29 '18 at 22:26
  • Also related: [When to use volatile with multi threading?](//stackoverflow.com/a/58535118) explains that all mainstream C++ implementations run threads across cache-coherent shared address space. But the ISO C++ standard technically doesn't require that. – Peter Cordes Nov 19 '19 at 21:14

10 Answers10

7

But compiler in principal should not cache a volatile variable, right?

No, the compiler in principle must read/write the address of the variable each time you read/write the variable.

[Edit: At least, it must do so up to the point at which the the implementation believes that the value at that address is "observable". As Dietmar points out in his answer, an implementation might declare that normal memory "cannot be observed". This would come as a surprise to people using debuggers, mprotect, or other stuff outside the scope of the standard, but it could conform in principle.]

In C++03, which does not consider threads at all, it is up to the implementation to define what "accessing the address" means when running in a thread. Details like this are called the "memory model". Pthreads, for example, allows per-thread caching of the whole of memory, including volatile variables. IIRC, MSVC provides a guarantee that volatile variables of suitable size are atomic, and it will avoid caching (rather, it will flush as far as a single coherent cache for all cores). The reason it provides that guarantee is because it's reasonably cheap to do so on Intel -- Windows only really cares about Intel-based architectures, whereas Posix concerns itself with more exotic stuff.

C++11 defines a memory model for threading, and it says that this is a data race (i.e. that volatile does not ensure that a read in one thread is sequenced relative to a write in another thread). Two accesses can be sequenced in a particular order, sequenced in unspecified order (the standard might say "indeterminate order", I can't remember), or not sequenced at all. Not sequenced at all is bad -- if either of two unsequenced accesses is a write then behavior is undefined.

The key here is the implied "and then" in "I modify an element from a thread AND THEN the thread reading it does not notice the change". You're assuming that the operations are sequenced, but they're not. As far as the reading thread is concerned, unless you use some kind of synchronization the write in the other thread hasn't necessarily happened yet. And actually it's worse than that -- you might think from what I just wrote that it's only the order of operations that is unspecified, but actually the behavior of a program with a data race is undefined.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I know what you are saying, but I run the program for enough time to notice that the thread keeps on reading the same value, even long after it has been changed by another thread. The cache coherence might take some time, but it can be not more than few microseconds I guess, actually much lesser than that I think. – pythonic Oct 03 '12 at 14:25
  • 1
    @user1018562: the point is that the data race is *undefined behavior*. One motivation for it being UB is to do with non-coherent caches, but the behavior once it happens might be anything, because the optimizer might have relied on there being no data races when it transformed your code. The purpose of requiring your code to have no data races, is to allow the compiler to perform transformations that are *incorrect* for code that has them. – Steve Jessop Oct 03 '12 at 14:26
  • 1
    @SteveJessop: People place too much emphasis on “undefined behavior”. Behavior is never specified to be undefined. It is not simply not defined. The difference is that the C standard might not define behavior, but it does not prevent another specification from defining it. You cannot conclude from the fact that C does not define behavior that we cannot expect particular hardware to behave in certain ways. If the hardware does propagate changes after a small time, and a change occurs, and the change is not observed after the time has passed, then there is a bug. – Eric Postpischil Oct 03 '12 at 16:10
  • @Eric: mostly true, but since the questioner doesn't say what compiler or code we're talking about then it's a massive guess to say what the results should be, *regardless* of the hardware. It's no use having hardware that propagates changes if you don't write code (or use a compiler) that actually makes a change. If the questioner had asked a different question then I might have given a different answer with less emphasis on standards and more on the behavior of a particular compiler. You're right, there probably is a bug, almost certainly in the questioner's code. – Steve Jessop Oct 04 '12 at 09:55
  • And you're right to suggest looking at the assembly too, but the point of this answer is that even if the questioner discovers that the compiler has omitted the read or write or memory barrier that he expects, he doesn't get to call it a non-conforming compiler. He asked about "in principle", not about "in my compiler". – Steve Jessop Oct 04 '12 at 10:00
  • 1
    @SteveJessop: No. The question is not about a memory barrier. The question is whether the assembly contains a load instruction where the C abstract machine accesses the object. This is required by the C standard. – Eric Postpischil Oct 04 '12 at 11:37
  • @Eric: citation needed, I think. I don't see where the C or C++ standards talk about load instructions. They do talk about accessing the object, but see Dietmar's answer about the CPU considering "normal memory" not being observable, and note that the standard does not anywhere distinguish between compiler transformations and CPU transformations. If the CPU is permitted to decree an object not observable, then so is the compiler. Why it would want to is another matter, and one that for some reason the questioner has chosen to exclude from discussing by saying "in principle". – Steve Jessop Oct 04 '12 at 15:56
  • Anyway, if the question gets edited to the point where I don't think this answer is relevant any more I'll delete it. If the question is "in principle, is this behavior permitted when I have a data race on a `volatile` object" then the answer is "yes, but your implementation might not do so". If the question is about the effect of a store on Intel, the answer's different. For example if the questioner posts the asm (and it contains the relevant load/store instructions), and changes the question to be about that, the C and C++ standards would no longer be relevant. – Steve Jessop Oct 04 '12 at 16:20
  • @EricPostpischil: Some compiler writers take the view that if a compiler can determine that Undefined Behavior will inevitably occur if a certain piece of code is reached when some variable x is negative, it would be entitled to assume that x cannot be negative there and optimize out any `if (x < 0)` tests which would, if taken, not prevent the Undefined Behavior from occurring. Personally, I think a decent standard should define various optimal promises an implementation could make, along with means by which a program could refuse compilation on implementations where it wouldn't work... – supercat May 23 '15 at 22:28
  • ...but such a philosophy hasn't yet taken hold. There are a lot of platforms which can "free of charge" offer some loose guarantees about unsequenced accesses (e.g. that every read will yield some value that has been written sometime) and there are many algorithms where even a very loose guarantee like that would suffice to ensure correctness, especially if combined with a directive that effectively limited the number of times or amount of time execution could pass it without a cache update. Adding synchronization sufficient to prevent Undefined Behavior could be a lot more expensive... – supercat May 23 '15 at 22:36
  • ...than having code which didn't particularly care about whether it got old or new data, so long as any new data would show up eventually. – supercat May 23 '15 at 22:37
4

C

What volatile does:

  • Guarantees an up-to-date value in the variable, if the variable is modified from an external source (a hardware register, an interrupt, a different thread, a callback function etc).
  • Blocks all optimizations of read/write access to the variable.
  • Prevent dangerous optimization bugs that can happen to variables shared between several threads/interrupts/callback functions, when the compiler does not realize that the thread/interrupt/callback is called by the program. (This is particularly common among various questionable embedded system compilers, and when you get this bug it is very hard to track down.)

What volatile does not:

  • It does not guarantee atomic access or any form of thread-safety.
  • It cannot be used instead of a mutex/semaphore/guard/critical section. It cannot be used for thread synchronization.

What volatile may or may not do:

  • It may or may not be implemented by the compiler to provide a memory barrier, to protect against instruction cache/instruction pipe/instruction re-ordering issues in a multi-core environment. You should never assume that volatile does this for you, unless the compiler documentation explicitly states that it does.
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • I am not using volatile variables as atomic variables, for memory barrier, or for thread synchronization. – pythonic Oct 03 '12 at 17:18
3

With volatile you can only impose that a variable is re-read whenever you use its value. It doesn't guarantee that the different values/representations that are present on different levels of your architecture are consistent.

To have such gurantees you'd need the new utilities from C11 and C++1 concerning atomic access and memory barriers. Many compilers implement these already in terms of extension. E.g the gcc family (clang, icc, etc) have builtins starting with prefix __sync to implement these.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • I think with __sync you ensure atomic operations but you won't prevent race condition. – Genís Oct 03 '12 at 14:20
  • sure that atomics avoid race conditions, that is exactly their definition. But they also guarantee coherence of data. – Jens Gustedt Oct 03 '12 at 14:21
  • Sorry, I should have said atomics doesn't ensure correct synchronization among threads. – Genís Oct 03 '12 at 14:26
  • Hardware guarantees cache coherency (other than the store buffer). The asm emitted by the compiler for `atomic` doesn't do anything to make sure other threads can see your stores. That always happens. A seq-cst store will make the current thread *wait* until that happens before doing other loads/stores, though. – Peter Cordes Jul 29 '18 at 22:04
2

Volatile Keyword only guarantees that the compiler will not use register for this variable. Thus every access to this variable will go and read the memory location. Now, I assume that you have cache coherence among the multiple processors in your architecture. So if one processor writes and other reads it, then it should be visible under normal conditions. However, you should consider the corner cases. Suppose the variable is in the pipeline of one processor core and other processor is trying to read it assuming that has been written, then there is a problem. So essentially, the shared variables should be either guarded by locks or should be protected by using barrier mechanism correctly.

Raj
  • 3,300
  • 8
  • 39
  • 67
  • Also, I am wondering with some optimization levels enabled while compiling, if the compiler by any chance removed this statement? This is just a thought. One of the methods to see what is going on, is by dumping the assembly code using some utility. – Raj Oct 04 '12 at 10:52
2

The semantics of volatile are implementation-defined. If a compiler knew that interrupts would be disabled while certain piece of code was executed, and knew that on the target platform there would be no means other than interrupt handlers via which operations on certain storage would be observable, it could register-cache volatile-qualified variables within such storage just the same as it could cache ordinary variables, provided it documented such behavior.

Note that what aspects of behavior are counted as "observable" may be defined in some measure by the implementation. If an implementation documents that it is not intended for use on hardware which uses main RAM accesses to trigger required externally-visible actions, then accesses to main RAM would not be "observable" on that implementation. The implementation would be compatible with hardware which was capable of physically observing such accesses, if nothing cared whether any such accesses were actually seen. If such accesses were required, however, as they would be if the accesses were regarded as "observable", however, the compiler would not be claiming compatibility and would thus make no promise about anything.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • This is true, but all mainstream compilers choose to make `volatile` mean what you'd expect, and really load or store for each `volatile` access. So everything is correct even if you're observing by single-stepping with a debugger, or in a simulator. This means that `volatile` does happen to work as a roll-your-own `atomic` with `memory_order_relaxed`, for types narrow enough to be naturally atomic. (It's UB in general of course, but you could argue that an implementation defines `volatile` strongly enough. – Peter Cordes Jul 29 '18 at 22:10
  • Related: [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/q/387181)) goes into the details of `volatile` vs. `atomic` for an interrupt handler. – Peter Cordes Jul 29 '18 at 22:11
  • @PeterCordes: They don't work well enough to synthesize mutexes, even on single-core platforms unless everything that's supposed to be guarded by the mutex is qualified `volatile`--a requirement which is essentially unique to "optimized" C and which negates the benefits of optimization. – supercat Jul 30 '18 at 02:50
  • That's why I said they work as *relaxed* atomics, because they can't provide acquire or release semantics wrt. non-volatile objects. (Of course with implementation-specific stuff like `asm("" ::: "memory")` compiler barriers and/or hardware barriers, you can roll your own memory ordering, too, the way the Linux kernel does.) But my main point was just that although the standard allows that, I don't think any implementations try to do any of the things you suggest. – Peter Cordes Jul 30 '18 at 03:10
  • @PeterCordes: From what I can tell, `icc` will regard a combination of a `volatile read` and `volatile write` as a release and acquire, although neither will work in isolation. The ability to do something that will serve as an acquire/release barrier *at least with regard to compiler reordering* has always been essential in systems programming, but the Standard doesn't require that *because it makes no attempt to mandate that all implementations be suitable for systems programming*, leaving that as a Quality of Implementation issue. A substantial corpus of existing code requires that... – supercat Jul 30 '18 at 04:29
  • ...optimizations be disabled. Some compiler writers seem to take great pride in regarding such code as "defective", but I think it is much more reasonable to say that the compiler writers have not made a bona fide effort to produce quality compilers suitable for systems programming. The authors of the Standard noted that C had been serving a useful role as a form of "portable assembly language", and on many platforms it's possible to write an entire freestanding application that runs on bare metal without using anything outside standard C syntax beyond a means of telling a linker... – supercat Jul 30 '18 at 04:36
  • ...where to put things. One would need an implementation which is intended for such use, of course, but existing C syntax would be adequate to handle acquire/release barriers, even with optimization enabled, if compiler writers didn't decide to require use of compiler-specific syntax instead. – supercat Jul 30 '18 at 04:38
  • When you say "existing", you're excluding C11 I think? We have ISO standard acq/rel semantics in C11's (optionally-supported) `stdatomic` library, so no need to complain any longer about lack of language support. I think with relaxed `_Atomic` plus `atomic_signal_fence`, we can portably get as much compile-time ordering as we want / need, with no run-time cost of actual barrier instructions that you'd get from using atomic load/store with ordering stronger than relaxed on some platforms. (Of course you need to avoid `foo++` if you don't want a slow atomic RMW.) – Peter Cordes Jul 30 '18 at 05:11
  • @PeterCordes: What fraction of C code is written to require a C11 compiler? Does the addition of new features to a language instantly render existing code that doesn't use such features defective? – supercat Jul 30 '18 at 05:18
  • IDK what your point is. Do you wish C had standard memory barriers earlier than C11? If you want well-defined portable inter-thread communication, especially with memory-ordering, `stdatomic.h` is probably the best way to get it these days. So I'd expect that more and more code will be written that assumes C11 features. But no, C11 doesn't make existing implementation-specific code wrong; the Linux kernel's hand-rolled atomics using inline asm is still supported by gcc, for example. It was never portable, and still has well-defined behaviour *in GNU C*, but not in ISO C of course. – Peter Cordes Jul 30 '18 at 05:47
  • @PeterCordes: Every dialect suitable for system programming has always provided the semantics necessary to support a basic mutex. Many of them did so without requiring any special syntax. While C11 may be the first version of the Standard to "officially" define some of the semantics needed for systems programming, compilers that were designed for systems programming have supported such semantics for more than four decades. – supercat Jul 30 '18 at 06:07
1

For C++:

From what I know, the compiler never optimizes a variable that is declared as volatile.

Your premise is wrong. volatile is a hint to the compiler and doesn't actually guarantee anything. Compilers can choose to prevent some optimizations on volatile variables, but that's it.

volatile isn't a lock, don't try to use it as such.

7.1.5.1

7) [ Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C. —end note]

Luchian Grigore
  • 253,575
  • 64
  • 457
  • 625
  • 7
    Actually, the requirements on `volatile` are fairly strong from a compiler point-of-view. It's not like `register` or `inline`, which compilers are free to ignore. Precise access to volatile objects is one of the minimal requirements of a conforming implementation: if the compiler treats it as merely a hint, the implementation is nonconforming. (See 5.1.2.3 of the C standard, I believe it's similar for C++.) Your conclusion is correct, but not for the reason you give. –  Oct 03 '12 at 14:16
  • @hvd looking for a quote right now, but I believe you're wrong. – Luchian Grigore Oct 03 '12 at 14:17
  • volatile doesn't imply that writes are atomic though. That only works for volatile bool - so a write from another thread may not have completed before your read – Martin Beckett Oct 03 '12 at 14:23
  • @MartinBeckett I never said that. – Luchian Grigore Oct 03 '12 at 14:23
  • @LuchianGrigore - sorry, was meant as a comment on the question, I was in the wrong comment box – Martin Beckett Oct 03 '12 at 14:24
  • 2
    I don't know about C++, but in C the compiler is not allowed to optimize volatile. C11 5.1.2.3/2 `"Accessing a volatile object, ... are all side effects"`. 5.1.2.3/4 `"An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."` 5.1.2.3/6 `"The least requirements on a conforming implementation are: — Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine."` – Lundin Oct 03 '12 at 14:27
  • Regarding atomic access, this is introduced in C11 as `volatile sig_atomic_t` which is guaranteed not to be optimized and guaranteed to be atomic. Older C standards make no guarantees about atomic access. – Lundin Oct 03 '12 at 14:29
  • 2
    This answer is basically correct, `volatile` doesn't mean "don't optimize". In fact, I don't think "don't optimize this variable" even has a well-defined meaning. But the semantics of `volatile` aren't *solely* a hint to avoid optimization, so `volatile` doesn't guarantee *nothing*. Access to `volatile` objects are observable behavior. Since hvd mentioned it to compare, C++ compilers aren't free to ignore `inline` or `register` either: both have defined meanings in addition to their secondary roles as optimization hints. C compilers can ignore `register`, and also `restrict`. – Steve Jessop Oct 03 '12 at 14:31
  • @LuchianGrigore What I meant by that your conclusion is still correct, BTW, is that while the *compiler* must absolutely not cache the memory location in a register, the CPU may and does still do that behind the compiler's back, which breaks when another CPU accesses the same memory location. `volatile` doesn't address that. `volatile` is useful for for example signal handlers, which run on the same CPU as the code it interrupts. –  Oct 03 '12 at 14:35
  • @SteveJessop Indeed, my comment was not entirely accurate. In the same spirit as the other effects you point out, C compilers cannot ignore `register` entirely either: they must still remember that the keyword was used in order to diagnose invalid attempts to take a `register` variable's address with the `&` operator. –  Oct 03 '12 at 14:37
  • @LuchianGrigore But addressing your edit, that refers to 1.9 of the C++ standard, which contains substantially the same text as 5.1.2.3 of the C standard that I referred to, see 1.9p8. –  Oct 03 '12 at 14:42
  • @hvd: oh drat, I got C and C++ the wrong way round, didn't I? C++ can ignore `register` and C can't. It's the ban on taking the address that I meant as the "primary role" of `register`, just like the primary role of `inline` is to do with the ODR. Btw I don't think it's correct that the compiler absolutely must not cache `volatile`s. For sanity's sake they generally don't, but the standard doesn't distinguish between the compiler, the CPU, or the memory cache as distinct parts of the implementation. Anything a CPU is permitted to do, a compiler is permitted to do. – Steve Jessop Oct 03 '12 at 14:47
  • @hvd I'd argue about whether the CPU is allowed to "do things behind the compiler's back" or not. The compiler is written for a specific CPU. If it happens to be a multi-core one with instruction cache, then I would say that the compiler needs to take this in account or the implementation is incorrect. C and C++ code compiles into binary op codes, so it doesn't make sense to write a compiler for "an unknown CPU type". – Lundin Oct 03 '12 at 14:52
  • ... for example (and this is deliberately a bit weird but it conforms), you could write a compiler that emulates the behavior of architecture A, on architecture B. Including introducing surprising caching behavior. Basically, the actual meaning of a volatile write being "observable behavior" is a bit open to interpretation because (due to the deliberate flexibility of the C and C++ memory model) the meaning of "a region of memory" in the definition of "an object" is a bit open to interpretation. The standard doesn't require that it mean in particular a region of RAM, or cache, or whatnot. – Steve Jessop Oct 03 '12 at 14:52
  • @SteveJessop In C, up to C99, there was no standard threading, at all. All correct portable C99 programs are singlethreaded, any multithreaded programs rely on implementation extensions, so so long as `volatile` works as required for singlethreaded programs, the implementation can claim conformance to C99. To be honest, for C11, I'm not sure how threading is specified, I'd need to read the standard in more detail; I do know this memory caching per CPU / per core has been a known issue for a long time, and I'm fairly confident that it won't have been resolved by disallowing memory caching. –  Oct 03 '12 at 16:11
  • @SteveJessop Agreed that the standard doesn't actually make the distinction between compiler, CPU, memory, etc. They're all part of "the implementation". Concretely, an implementation with known faulty memory, for example, cannot claim conformance to any of the C (and probably C++) standards, even if its installed compiler and standard library can. –  Oct 03 '12 at 16:14
1

The volatile keyword has nothing to do with concurrency in C++ at all! It is used to have the compiler prevented from making use of the previous value, i.e., the compiler will generate code accessing a volatile value every time is accessed in the code. The main purpose are things like memory mapped I/O. However, use of volatile has no affect on what the CPU does when reading normal memory: If the CPU has no reason to believe that the value changed in memory, e.g., because there is no synchronization directive, it can just use the value from its cache. To communicate between threads you need some synchronization, e.g., an std::atomic<T>, lock a std::mutex, etc.

Component 10
  • 10,247
  • 7
  • 47
  • 64
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • It can also prevent the compiler from optimizing away a variable that you "aren't using". – Derek Oct 03 '12 at 14:27
  • Yes, volatile is irrelevant to threads. But what constitutes a volatile "access" is implementation-defined. It's not 1:1 with code. – philipxy Feb 20 '17 at 08:24
  • Cache is coherent, so (modulo the store buffer) all cores share the same view of memory. A thread can't keep re-reading the same stale value from a `volatile` indefinitely. C++ could maybe be implemented on a system where explicit flushing was required for data to become globally visible, but it would be expensive. This question is tagged x86, but this also applies to weakly-ordered ISAs like PowerPC and ARM. – Peter Cordes Jul 29 '18 at 21:59
  • You only need memory barriers if you need to make a thread wait until a store/load has become globally visible before it does something else; asm stores already become globally visible as fast as possible, committing from the store buffer to L1d. And `volatile` means the compile emits asm to actually store when you expect. – Peter Cordes Jul 29 '18 at 22:00
1

Volatile only affects the variable it is in front of. Here in your example, a pointer. Your code: volatile long array[8], the pointer to the first element of the array is volatile, not it's content. (same for objects of any kind)

you could adapt it like in How do I declare an array created using malloc to be volatile in c++

Community
  • 1
  • 1
user3387542
  • 611
  • 1
  • 8
  • 28
0

C++ accesses by volatile lvalues and C accesses to volatile objects are "abstractly" "observable"--although in practice C behaviour is per the C++ standard not the C standard. Informally, the volatile declaration tells every thread the value might change somehow regardless of the text in any thread. Under the Standards with threads, there isn't any notion of a write by another thread causing a change in an object, volatile or not, shared or not, except for a shared variable by the synchronizing function call at the start of a synchronized critical region. volatile is irrelevant to thread shared objects.

If your code doesn't properly synchronize the thread you are talking about, your one thread reading what an other thread wrote has undefined behaviour. So the compiler can generate any code it wants. If your code is properly synchronized then writes by other threads only happen at thread synchronization calls; you don't need volatile for that.

PS

The standards say "What constitutes an access to an object that has volatile-qualified type is implementation-defined." So you can't just assume that there is a read access for every dereferencing of a volatile lvalue or a write acces for every assignment through one.

Moreover how ("abstract") "observable" volatile accesses are "actually" manifested is implementation defined. So a compiler might not generate code for hardware accesses corresponding to the defined abstract accesses. Eg maybe only objects with static storage duration and external linkage compiled with a certain flag for linking into special hardware locations can ever be changed from outside the program text, so that other objects' volatile is ignored.

Community
  • 1
  • 1
philipxy
  • 14,867
  • 6
  • 39
  • 83
-1

However, in certain situations I've noticed that even if I modify an element from a thread, the thread reading it does not notice the change. It keeps on reading the same old value, as if compiler has cached it somewhere.

This is not because the compiler cached it somewhere, but because the reading thread reads from its CPU core's cache, which might be different from the writing thread's one. To ensure value change propagation across CPU cores, you need to use proper memory fences, and you neither can nor need to use volatile for that in C++.

usta
  • 6,699
  • 3
  • 22
  • 39
  • But in a processor with proactive cache coherence like x86, the cache of a core should be updated in this case, that is, whenever a core A writes to memory X, if core B tries to read from X, its cache corresponding to X will be updated. – pythonic Oct 03 '12 at 17:16
  • @pythonic: yup, x86, like all normal CPUs (ARM / PowerPC / MIPS / SPARC / ...), has coherent caches. Other cores will fairly promptly (within microseconds: [If I don't use fences, how long could it take a core to see another core's writes?](https://stackoverflow.com/q/51292687)) notice a store from another core if you use `volatile`. You only need barriers to order loads/stores in the reader, or to make the writer wait for stores to become globally visible before doing something else. – Peter Cordes Jul 29 '18 at 22:16
  • @pythonic: if the situation you describe persists for a long time, then you're doing something wrong or describing it wrong. – Peter Cordes Jul 29 '18 at 22:18