Using C/Pthreads: do shared variables need to be volatile?

Question

In the C programming language and Pthreads as the threading library; do variables/structures that are shared between threads need to be declared as volatile? Assuming that they might be protected by a lock or not (barriers perhaps).

Does the pthread POSIX standard have any say about this, is this compiler-dependent or neither?

Edit to add: Thanks for the great answers. But what if you're not using locks; what if you're using barriers for example? Or code that uses primitives such as compare-and-swap to directly and atomically modify a shared variable...

score 27 · Answer 1 · edited May 01 '19 at 20:35

27

As long as you are using locks to control access to the variable, you do not need volatile on it. In fact, if you're putting volatile on any variable you're probably already wrong.

https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/

edited May 01 '19 at 20:35

Soner from The Ottoman Empire

18,731
3
79
101

answered Sep 16 '08 at 23:08

Don Neufeld

22,720
11
51
50

Thanks for the answer; but what about scenarios where you're not using locks (refer to the edited question for an example). – fuad Sep 17 '08 at 03:41
1

I think this is actually wrong, see my reply below. The problem is that the compiler can do anything it likes to keep a value in local registers in a thread unless it is marked volatile. So volatile is needed to make sure data is written back to memory. – jakobengblom2 Oct 03 '08 at 13:13
4

If you are not using locks, you almost certainly need to use explicit memory barriers. Note that volatile is NOT a memory barrier as it does not affect any other loads and stores other than those to the volatile variable itself. It is also often a pessimization. – bdonlan May 18 '09 at 00:21
2

This answer is wrong. The article is wrong. The comments at the bottom of the article explain why. The author of the article misunderstands the purpose of volatile completely. Read other answers to learn what volatile is _actually_ for. – Michael Dorst Oct 15 '19 at 20:19
@jakobengblom2 If the compiler does that, then the platform doesn't comply with the PTHREADS standard. – David Schwartz Nov 22 '19 at 21:05

score 12 · Answer 2 · edited Jan 25 '18 at 15:02

The answer is absolutely, unequivocally, NO. You do not need to use 'volatile' in addition to proper synchronization primitives. Everything that needs to be done are done by these primitives.

The use of 'volatile' is neither necessary nor sufficient. It's not necessary because the proper synchronization primitives are sufficient. It's not sufficient because it only disables some optimizations, not all of the ones that might bite you. For example, it does not guarantee either atomicity or visibility on another CPU.

But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.

Right, but even if you do use volatile, the CPU is free to cache the shared data in a write posting buffer for any length of time. The set of optimizations that can bite you is not precisely the same as the set of optimizations that 'volatile' disables. So if you use 'volatile', you are relying on blind luck.

On the other hand, if you use sychronization primitives with defined multi-threaded semantics, you are guaranteed that things will work. As a plus, you don't take the huge performance hit of 'volatile'. So why not do things that way?

jakobengblom2 · Accepted Answer · 2009-01-05T19:43:02.393

7

I think one very important property of volatile is that it makes the variable be written to memory when modified, and reread from memory each time it accessed. The other answers here mix volatile and synchronization, and it is clear from some other answers than this that volatile is NOT a sync primitive (credit where credit is due).

But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.

Especially on register-rich machines (i.e., not x86), variables can live for quite long periods in registers, and a good compiler can cache even parts of structures or entire structures in registers. So you should use volatile, but for performance, also copy values to local variables for computation and then do an explicit write-back. Essentially, using volatile efficiently means doing a bit of load-store thinking in your C code.

In any case, you positively have to use some kind of OS-level provided sync mechanism to create a correct program.

For an example of the weakness of volatile, see my Decker's algorithm example at http://jakob.engbloms.se/archives/65, which proves pretty well that volatile does not work to synchronize.

edited Jan 05 '09 at 19:43

answered Oct 03 '08 at 13:12

jakobengblom2

5,531
2
25
33

3

Saving variables in registers for a long time is exactly the point of the compilers optimizer. Using volatile completely negates that. Note that in GCC (and probably most compilers) function calls clobber memory, meaning that if you write to a non-local variable then do a function call, the compiler is not allowed to push the write after the function call - which is seemingly what your intention is to be using volatile for is anyways. That isn't what volatile is for... – Greg Rogers Apr 24 '09 at 17:11
2

Volatile is for marking a variable that could change spontaneously (embedded systems mapping hardware entities to memory locations is one example of this). If you are using exclusive locking an ordinary variable can't change spontaneously. Herb Sutter's article on this is pretty good: http://www.ddj.com/hpc-high-performance-computing/212701484 – Greg Rogers Apr 24 '09 at 17:14
When using GCC, their interpretation of the standard is `volatile` is reserved for memory that can change due to hardware; and not memory changes that can occur due to software. For GCC, you are supposed to use a Memory Barrier in software. Here's the code: `asm volatile("": : :"memory")`. Microsoft's interpretation is memory that can change doe to both hardware and software. – jww Dec 08 '14 at 06:45
5

Almost nothing about this answer is correct. In particular, `volatile` does not force data to be predictably written to actual memory and nothing in the standards requires it to. Again, it does *not* do that on most modern machines and the standard does not require it to. – David Schwartz Aug 18 '16 at 16:35
Very confusing, no good, answer. Does not even explain what is meant by "in memory" which could be diff things according to the problem at hand (memory for the purpose of communicating w/ an external card would be main memory, aka RAM; memory could also mean shared L3 cache; in other context, L1 cache, etc.) – curiousguy Nov 05 '19 at 22:32

score 4 · Answer 4 · answered Nov 14 '11 at 10:21

There is a widespread notion that the keyword volatile is good for multi-threaded programming.

Hans Boehm points out that there are only three portable uses for volatile:

volatile may be used to mark local variables in the same scope as a setjmp whose value should be preserved across a longjmp. It is unclear what fraction of such uses would be slowed down, since the atomicity and ordering constraints have no effect if there is no way to share the local variable in question. (It is even unclear what fraction of such uses would be slowed down by requiring all variables to be preserved across a longjmp, but that is a separate matter and is not considered here.)
volatile may be used when variables may be "externally modified", but the modification in fact is triggered synchronously by the thread itself, e.g. because the underlying memory is mapped at multiple locations.
A volatile sigatomic_t may be used to communicate with a signal handler in the same thread, in a restricted manner. One could consider weakening the requirements for the sigatomic_t case, but that seems rather counterintuitive.

If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:

atomicity
memory consistency, i.e. the order of a thread's operations as seen by another thread.

Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.

Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:

 volatile int Ready;       

    int Message[100];      

    void foo( int i ) {      

        Message[i/10] = 42;      

        Ready = 1;      

    }

It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.

You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium(TM) processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power(TM) will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction.Volatile has nothing to do with memory fences.

So what's the solution for multi-threaded programming? Use a library or language extension that implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:

POSIX threads
Windows(TM) threads
OpenMP
TBB

Based on article by Arch Robison (Intel)

cmcginty · Answer 5 · 2016-07-06T10:17:12.083

2

NO.

Volatile is only required when reading a memory location that can change independently of the CPU read/write commands. In the situation of threading, the CPU is in full control of read/writes to memory for each thread, therefore the compiler can assume the memory is coherent and optimizes the CPU instructions to reduce unnecessary memory access.

The primary usage for volatile is for accessing memory-mapped I/O. In this case, the underlying device can change the value of a memory location independently from CPU. If you do not use volatile under this condition, the CPU may use a previously cached memory value, instead of reading the newly updated value.

edited Jul 06 '16 at 10:17

answered Feb 17 '09 at 23:06

cmcginty

113,384
42
163
163

1

Your first paragraph is incorrect. The CPU doesn't know about the volatile keyword. Only the C compiler, it only constrains how the C compiler uses that memory location. Different mechanisms are used to manage _each_ CPU's view of memory (as each core/PU has a different perspective, they see each other as foreign but cache coherent). For reference these other mechanisms are atomic operations and memory barriers, but it's best just to stick to locking primatives provided by a library. – Paul Bone Aug 26 '12 at 11:23

score 2 · Answer 6 · answered Sep 16 '08 at 23:03

In my experience, no; you just have to properly mutex yourself when you write to those values, or structure your program such that the threads will stop before they need to access data that depends on another thread's actions. My project, x264, uses this method; threads share an enormous amount of data but the vast majority of it doesn't need mutexes because its either read-only or a thread will wait for the data to become available and finalized before it needs to access it.

Now, if you have many threads that are all heavily interleaved in their operations (they depend on each others' output on a very fine-grained level), this may be a lot harder--in fact, in such a case I'd consider revisiting the threading model to see if it can possibly be done more cleanly with more separation between threads.

score 1 · Answer 7 · answered Nov 19 '19 at 13:54

POSIX 7 guarantees that functions such as pthread_lock also synchronize memory

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 "4.12 Memory Synchronization" says:

The following functions synchronize memory with respect to other threads:

pthread_barrier_wait()
pthread_cond_broadcast()
pthread_cond_signal()
pthread_cond_timedwait()
pthread_cond_wait()
pthread_create()
pthread_join()
pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_timedwait()
sem_trywait()
sem_wait()
semctl()
semop()
wait()
waitpid()

Therefore if your variable is guarded between pthread_mutex_lock and pthread_mutex_unlock then it does not need further synchronization as you might attempt to provide with volatile.

Related questions:

Patrick Pan · Answer 8 · 2017-10-28T09:00:29.147

The underlying reason is that the C language semantic is based upon a single-threaded abstract machine. And the compiler is within its own right to transform the program as long as the program's 'observable behaviors' on the abstract machine stay unchanged. It can merge adjacent or overlapping memory accesses, redo a memory access multiple times (upon register spilling for example), or simply discard a memory access, if it thinks the program's behaviors, when executed in a single thread, doesn't change. Therefore as you may suspect, the behaviors do change if the program is actually supposed to be executing in a multi-threaded way.

As Paul Mckenney pointed out in a famous Linux kernel document:

It _must_not_ be assumed that the compiler will do what you want with memory references that are not protected by READ_ONCE() and WRITE_ONCE(). Without them, the compiler is within its rights to do all sorts of "creative" transformations, which are covered in the COMPILER BARRIER section.

READ_ONCE() and WRITE_ONCE() are defined as volatile casts on referenced variables. Thus:

int y;
int x = READ_ONCE(y);

is equivalent to:

int y;
int x = *(volatile int *)&y;

So, unless you make a 'volatile' access, you are not assured that the access happens exactly once, no matter what synchronization mechanism you are using. Calling an external function (pthread_mutex_lock for example) may force the compiler do memory accesses to global variables. But this happens only when the compiler fails to figure out whether the external function changes these global variables or not. Modern compilers employing sophisticated inter-procedure analysis and link-time optimization make this trick simply useless.

In summary, you should mark variables shared by multiple threads volatile or access them using volatile casts.

As Paul McKenney has also pointed out:

I have seen the glint in their eyes when they discuss optimization techniques that you would not want your children to know about!

But see what happens to C11/C++11.

Tom Leys · Answer 9 · 2008-09-16T23:17:17.137

Volatile means that we have to go to memory to get or set this value. If you don't set volatile, the compiled code might store the data in a register for a long time.

What this means is that you should mark variables that you share between threads as volatile so that you don't have situations where one thread starts modifying the value but doesn't write its result before a second thread comes along and tries to read the value.

Volatile is a compiler hint that disables certain optimizations. The output assembly of the compiler might have been safe without it but you should always use it for shared values.

This is especially important if you are NOT using the expensive thread sync objects provided by your system - you might for example have a data structure where you can keep it valid with a series of atomic changes. Many stacks that do not allocate memory are examples of such data structures, because you can add a value to the stack then move the end pointer or remove a value from the stack after moving the end pointer. When implementing such a structure, volatile becomes crucial to ensure that your atomic instructions are actually atomic.

`volatile` doesn't guarantee atomicity, though. It's for indicating something outside the program is modifying the contents of the variable. — Allen, Sep 16 '08 at 23:17
Even with volatile, something as simple as "a = a + 1;" is not atomic. It just means that the compiler will re-load 'a' for this operation, and store it back immediately. There is still a window in which another thread can race for it. — bdonlan, May 18 '09 at 00:22

score 0 · Answer 10 · answered Sep 16 '08 at 23:10

Volatile would only be useful if you need absolutely no delay between when one thread writes something and another thread reads it. Without some sort of lock, though, you have no idea of when the other thread wrote the data, only that it's the most recent possible value.

For simple values (int and float in their various sizes) a mutex might be overkill if you don't need an explicit synch point. If you don't use a mutex or lock of some sort, you should declare the variable volatile. If you use a mutex you're all set.

For complicated types, you must use a mutex. Operations on them are non-atomic, so you could read a half-changed version without a mutex.

score -1 · Answer 11 · answered Apr 13 '16 at 19:52

No.

First, volatile is not necessary. There are numerous other operations that provide guaranteed multithreaded semantics that don't use volatile. These include atomic operations, mutexes, and so on.

Second, volatile is not sufficient. The C standard does not provide any guarantees about multithreaded behavior for variables declared volatile.

So being neither necessary nor sufficient, there's not much point in using it.

One exception would be particular platforms (such as Visual Studio) where it does have documented multithreaded semantics.

score -1 · Answer 12 · answered Sep 22 '10 at 15:35

-1

Some people obviously are assuming that the compiler treats the synchronization calls as memory barriers. "Casey" is assuming there is exactly one CPU.

If the sync primitives are external functions and the symbols in question are visible outside the compilation unit (global names, exported pointer, exported function that may modify them) then the compiler will treat them -- or any other external function call -- as a memory fence with respect to all externally visible objects.

Otherwise, you are on your own. And volatile may be the best tool available for making the compiler produce correct, fast code. It generally won't be portable though, when you need volatile and what it actually does for you depends a lot on the system and compiler.

answered Sep 22 '10 at 15:35

Stephen Nuchia

1

1

There are several interacting factors, none of which are addressed by the standard. 1) you need to influence the scheduler and/or the delay the progress of other threads explicitly. System mutex primitives, spin waits, and bus-locking instructions all do this at different levels. – Stephen Nuchia Sep 22 '10 at 15:38
2) You need to influence the hardware's memory hierarchy, from registers to RAM and maybe all the way to backing store, to ensure the coherence properties you need are met. 3) You need to ensure atomicity is present when you are depending on it, this is closely related to 2 but also encompases alignment and potentially aliasing issues. – Stephen Nuchia Sep 22 '10 at 15:45
4) You need to influence the generation of load/store code, including implicit load/stores on CISC machines, so that the compiler does not defeat the intent of your textually-correct concurrency-aware code. A proper mutex library, properly integrated into the compiler and properly used, will do all of these except atomicity, and if you use it in a way that does not depend on atomicity you're golden. If you are using any other combination of tools and techniques you are on your own. – Stephen Nuchia Sep 22 '10 at 15:46
to jjj below: whether the compiler is oblivious to the memory barrier depends on the barrier you use and the compiler. I see a lot of old C/C++ code with inline asm cpuid instructions used as as-hoc memory barriers. It works because the microsoft compiler is conservative about inline assembly. – Stephen Nuchia Sep 22 '10 at 16:16

score -2 · Answer 13 · answered Nov 02 '10 at 03:35

-2

Variables that are shared among threads should be declared 'volatile'. This tells the compiler that when one thread writes to such variables, the write should be to memory (as opposed to a register).

answered Nov 02 '10 at 03:35

Adam Soffer

1,614
5
20
36

Using C/Pthreads: do shared variables need to be volatile?

13 Answers13

Linked

Related