Is volatile necessary for the resource used in a critical section?

Question

I am curious about whether volatile is necessary for the resources used in a critical section. Consider I have two threads executed on two CPUs and they are competing on a shared resource. I know I need to a locking mechanism to make sure only one thread is performing operations on that shared resource. Below is the pseudo code that will be executed on those two threads.

take_lock();

// Read shared resource.
read_shared_resouce();

// Write something to shared resource.
write_shared_resource();

release_lock();

I am wondering if I need to make that shared resource volatile to make sure that when one thread is reading shared resource, a thread won't just get the value from registers, it will actually read from that shared resource. Or maybe I should use a accessor functions to make the access to that shared resource volatile with some memory barrier operations instead of make that shared resource volatile?

Please edit the question to describe the lock routines and the resource in more detail. Are the locking routines the standard C library lock routines or something else, like operating system routines? — Eric Postpischil, Jul 18 '22 at 10:04
Related: [Why is volatile not considered useful in multithreaded C or C++ programming?](https://stackoverflow.com/q/2484980/2402272) — John Bollinger, Jul 20 '22 at 15:15

John Bollinger · Accepted Answer · 2022-07-18T15:27:32.740

6

I am curious about whether volatile is necessary for the resources used in a critical section. Consider I have two threads executed on two CPUs and they are competing on a shared resource. I know I need to a locking mechanism to make sure only one thread is performing operations on that shared resource.

Making sure that only one thread accesses a shared resource at a time is only part of what a locking mechanism adequate for the purpose will do. Among other things, such a mechanism will also ensure that all writes to shared objects performed by thread T_i before it releases lock L are visible to all other threads T_j after they subsequently acquire lock L. And that in terms of the C semantics of the program, notwithstanding any questions of compiler optimization, register usage, CPU instruction reordering, or similar.

When such a locking mechanism is used, volatile does not provide any additional benefit for making threads' writes to shared objects be visible to each other. When such a locking mechanism is not used, volatile does not provide a complete substitute.

C's built-in (since C11) mutexes provide a suitable locking mechanism, at least when using C's built-in threads. So do pthreads mutexes, Sys V and POSIX semaphores, and various other, similar synchronization objects available in various environments, each with respect to corresponding multithreading systems. These semantics are pretty consistent across C-like multithreading implementations, extending at least as far as Java. The semantic requirements for C's built-in multithreading are described in section 5.1.2.4 of the current (C17) language spec.

volatile is for indicating that an object might be accessed outside the scope of the C semantics of the program. That may happen to produce properties that interact with multithreaded execution in a way that is taken to be desirable, but that is not the purpose or intended use of volatile. If it were, or if volatile were sufficient for such purposes, then we would not also need _Atomic objects and operations.

The previous remarks focus on language-level semantics, and that is sufficient to answer the question. However, inasmuch as the question asks specifically about accessing variables' values from registers, I observe that compilers don't actually have to do anything much multithreading-specific in that area as long as acquiring and releasing locks requires calling functions.

In particular, if an execution E of function f writes to an object o that is visible to other functions or other executions of f, then the C implementation must ensure that that write is actually performed on memory before E evaluates any subsequent function call (such as is needed to release a lock). This is necessary because because the value written must be visible to the execution of the called function, regardless of any other threads.

Similarly, if E uses the value of o after return from a function call (such as is needed to acquire a lock) then it must load that value from memory to ensure that it sees the effect of any write that the function may have performed.

The only thing special to multithreading in this regard is that the implementation must ensure that interprocedural analysis optimizations or similar do not subvert the needed memory reads and writes around the lock and unlock functions. In practice, this rarely requires special attention.

edited Jul 18 '22 at 15:27

answered Jul 17 '22 at 16:46

John Bollinger

160,171
8
81
157

Re “C's built-in (since C11) mutexes provide a suitable locking mechanism”: YUou mean it provides a suitable locking mechanism for its mutex, which it takes as an argument and controls fully, yes? But the question is about some general resource detached from the locking mechanism. C’s mutex calls contain internally whatever synchronization they need. OP’s sample code acquires a lock, then does things with a resource the locking routines know nothing about, then releases the lock. What do you say is needed to ensure the accesses to that resource occur only during lock ownership? – Eric Postpischil Jul 17 '22 at 19:24
3

@EricPostpischil, I mean that if a C11 thread acquires a C11 mutex, modifies *any* object *o* via a write *w*, then releases the mutex, then any other C11 thread that subsequently acquires the same mutex and afterward reads *o* will see the effect of *w* or of a subsequent write to *o*. It is not necessary for *o* to be `volatile` for this to be assured, which was the question. Analogous guarantees apply to pthreads threads and mutexes, and to C++ threads and mutexes. – John Bollinger Jul 17 '22 at 20:17
@EricPostpischil, it is of course the programmer's responsibility to ensure that threads access shared resources only under protection of the appropriate lock. The assurances I describe are conditioned on them doing so, as I think is already conveyed by this answer. And the question stipulates that the program exercises such locking discipline. – John Bollinger Jul 17 '22 at 20:21
Re “if a C11 thread acquires a C11 mutex, modifies any object o via a write w, then releases the mutex, then any other C11 thread that subsequently acquires the same mutex and afterward reads o will see the effect of w or of a subsequent write to o”: Where does the standard say that or something that implies it? – Eric Postpischil Jul 17 '22 at 20:35
@EricPostpischil, this is covered, in slightly more general and much more detailed form, in C17 section 5.1.2.4. – John Bollinger Jul 18 '22 at 03:38
You should add a citation of 5.1.2.4 to the answer and possibly mention Note 2 in 5.1.2.4 6, which says “… Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform an acquire or consume operation on A…” (or, better, the normative text that implies that). I am working on comprehending the whole section. – Eric Postpischil Jul 18 '22 at 10:34
1

One concern I have is: Given a pure function `foo` (either declared with a pure attribute or with its source code visible to the compiler) that may be costly to compute, `int a = foo(r); baz(); return a + foo(r);` should be optimized to `baz(); return 2*foo(r);` if the external function `baz` cannot change `r`. Is there a case where `baz` can change `r` in a multi-threaded program when it cannot in a single-threaded program? If so, the thread semantics would seem to impair optimization, which would make me skeptical about this broad interpretation of 5.1.2.4 6. But maybe it cannot not happen? – Eric Postpischil Jul 18 '22 at 10:38
@EricPostpischil, I have added a reference to section 5.1.2.4. The normative text is dense and technical enough that I think few people who come upon this answer would gain any benefit from having it quoted to them. – John Bollinger Jul 18 '22 at 15:32
1

@EricPostpischil, with regard to your optimization scenario, if `r` has external linkage; if it has internal linkage and `baz()` is defined in the same translation unit; if it has no linkage and static storage duration, and the call to `baz` is recursive; or if a pointer to `r` has been published or is accessible via `baz()`'s arguments, then it is possible for `baz()` to modify `r`. If not, then not (discounting UB). Multithreading might make an analysis of those conditions more difficult, but as far as I am aware, it does not introduce any new avenues for r to be modified. – John Bollinger Jul 18 '22 at 15:51
1

Of course, it gets much more difficult if `r` is or provides access to a pointer, and one wants to know about whether the data it points to might be modified. I'm not sure that's a solvable problem in the general case even for serial code. – John Bollinger Jul 18 '22 at 15:55
@JohnBollinger, thanks for the detailed answer. I just want to make sure that in general, it's programmer's responsibility to ensure that after releasing lock, the memory operations performed in the critical section should be visible to other threads, and this guarantee can be provided by either general C library or some specific memory barrier operations. Also, threads may needs to use volatile way to get that shared source after getting lock to avoid reading value from registers. Is my thought right? – Haohao Chang Jul 23 '22 at 03:23
No, @HaohaoChang, that sounds quite different from what this answer says. The general case is that the programmer's responsibility extends to choosing a suitable locking mechanism (C11 mutex, pthreads mutex, *etc*.) and exercising correct locking discipline around accesses to shared objects. **That is sufficient by itself** to ensure that each thread's writes to shared objects are visible to other threads. The programmer does not need anything else for that purpose, and in particular, declaring the shared objects `volatile` does not confer any additional advantage in this regard. – John Bollinger Jul 23 '22 at 05:00
@JohnBollinger, if there is no such pre-implemented locking mechanism such as C11 mutex, pthreads mutex and only some CPU-specific HW locking methods are provided, if means programmers have to implement those things right? – Haohao Chang Jul 24 '22 at 07:18
@HaohaoChang, you have asked a question about C. In practice, every widely used threading implementation for C has appropriate locking mechanisms associated with it already. However, if, contrary to fact, there were no appropriate locking mechanism available then yes, you would need to implement one yourself that ensures not just mutual exclusion but also the appropriate memory barrier behavior. It is impossible to say whether `volatile` would help in that counter-factual world, with that hypothetical self-rolled locking mechanism, but it *shouldn't*. – John Bollinger Jul 24 '22 at 13:04

score 2 · Answer 2 · answered Jul 17 '22 at 04:12

2

The answer is no; volatile is not necessary (assuming the critical-section functions you are using were implemented correctly, and you are using them correctly, of course). Any proper critical-section API's implementation will include the memory-barriers necessary to handle flushing registers, etc, and therefore avoid the need for the volatile keyword.

answered Jul 17 '22 at 04:12

Jeremy Friesner

70,199
15
131
234

1

If the compiler decided to cache the shared resource in a register, there is nothing the routines `take_lock` and `release_lock` can do to write it to memory or read it from memory. They would have no information about which register the calling routine has cached it in or what resource has been cached, so they have no way to write/read the correct register to/from memory. This is entirely a problem of the code the compiler generates in the calling routine and must be dealt with by telling the compiler it must actually access the resource, not cache it. That is what `volatile` is for. – Eric Postpischil Jul 17 '22 at 07:06
take_lock() would need to flush all registers. You’re right that this would probably require some support from the compiler to implement reliably; however modern compilers do provide the necessary support for that (otherwise critical sections wouldn’t work reliably) – Jeremy Friesner Jul 17 '22 at 13:15
So the key point is that if we can tell the compiler it should not uses the cached the value? I think maybe programmer can enforce some memory barrier operations before releasing lock to make sure flushing all dirty data, and after taking lock, thread can access shared resource by a volatile way like using something like `READ_ONCE` implemented in Linux Kernel. Is may thought reasonable? – Haohao Chang Jul 17 '22 at 13:41
1

@JeremyFriesner: Flush them to where? The routines `take_lock` and `release_lock` would have no information that, say, `r3` is associated with memory address 0x12345678. This answer is wrong. – Eric Postpischil Jul 17 '22 at 13:45
@HaohaoChang if you are writing your own threading package, then you would need to do that sort of thing; but it requires a very good understanding of both the compiler and the hardware to do it correctly, and the code may be non-portable, so almost everyone simply uses the threading package that came with their compiler and OS instead (eg pthreads on Linux) so that all the hairy details have already been dealt with for them. – Jeremy Friesner Jul 17 '22 at 13:46
@EricPostpischil the compiler knows at compile time the memory address a given register is representing; the threads package knows a directive to give to the compiler (again at compile-time) as part of its lock() implementation to tell the compiler that registers need to be flushed at that point, and the compiler can then insert the necessary code there. This requires close cooperation with the compiler, but it is doable and done; otherwise threading APIs could not make the data-safety guarantees that they make. – Jeremy Friesner Jul 17 '22 at 13:51
1

@JeremyFriesner: Show such a directive given by a “threads package” to the compiler or any authoritative documentation specifying there is such a directive or coordination between a “threads package” and the compiler. – Eric Postpischil Jul 17 '22 at 13:54
@EricPostpischil sure, have a look here: https://stackoverflow.com/questions/54380144/how-does-gcc-know-that-a-register-needs-to-be-flushed-to-memory-when-memory-clob – Jeremy Friesner Jul 17 '22 at 14:01
Since I am developing in a embedding system with some OS other than Linux, so I think I need to do that kind of things like flushing before releasing locks and get shared resource via volatile way after taking lock haha. – Haohao Chang Jul 17 '22 at 14:04
1

@JeremyFriesner: That is about an inline assembly feature that GCC has. That does not show that a “threads package” uses it. Nor would it be useful unless `take_lock` were implemented as a macro that embedded the `asm` statement within it. Having it inside a called routine would not help, as its presence in an external routine could not affect the code generated in the calling routine. – Eric Postpischil Jul 17 '22 at 14:06
@EricPostpischil you seem wedded to your position, so I won't argue the subject any further. I'll only point out that if volatile is required inside critical sections, then all the programs out there that use critical sections but don't also use the volatile keyword must be buggy, and given the widespread use of critical sections without volatile, that seems unlikely. – Jeremy Friesner Jul 17 '22 at 17:45
1

@JeremyFriesner: What is correct is not decided by who is wedded to what position but by documentation. I have cited relevant passages of the C standard. Further, my answer explains why a failure to mark a resource as volatile will not manifest errors in many situations: The use of external routines to manage the lock and the inability of the compiler to know those external routines do not access the resource prevent the compiler from reordering the accesses to the resource relative to the lock routine calls. However, that is not always the case… – Eric Postpischil Jul 17 '22 at 19:27
… As my answer shows, in some circumstances, a compiler may conclude the external routines cannot access the resource in any way defined by the C standard, in which case the compiler can reorder the calls. And that is why not all improperly written code fails, but some will.… – Eric Postpischil Jul 17 '22 at 19:30
… Further, while you posit some “full service API” that manages the resource being locked as well as the lock, this is not what the question asks about. As John Bollinger’s answer notes, the C standard mutex routines may provide the necessary locking and manipulation of a mutex, but they can do that because they are passed the mutex itself to operate on… – Eric Postpischil Jul 17 '22 at 19:32
… In contrast, the code in the question shows the lock routines being called with no argument that would give them any information about or control over the shared resource. The code seeks to use a lock to gate access to an unrelated general resource. The lock routines cannot do this with the code as shown. Something needs to tie the lock routine calls to the resource access. – Eric Postpischil Jul 17 '22 at 19:34
@EricPostpischil the code in the question is *pseudocode*, meant to be taken as a stand-in for some other unspecified API. – Jeremy Friesner Jul 17 '22 at 20:15
1

@JeremyFriesner: Yes, and so why would you narrow it down to a specific all-in-one case of just a mutex instead of the general case of some lock and some resource? – Eric Postpischil Jul 17 '22 at 20:29
@EricPostpischil again, I'm done discussing the issue with you. If you want more information on the subject, see John Bollinger's excellent answer to this question. – Jeremy Friesner Jul 17 '22 at 22:04

score 0 · Answer 3 · answered Jul 17 '22 at 08:50

volatile is normally used inform compiler that this data might be change by others (interrupt, DMA, other CPU,...) to prevent un-expected optimization in compiler. So in your case you may need or don't need:

If you don't have some while loop with some info from share resource in the thread for value change, you don't really need for volatile.
If you have some wait like while (shareVal == 0) in the source code, you need to tell compiler explicit by attribute volatile.

For case 2 CPUs, there is also possibility issue with cache that a CPU is only reading value from cache memory. Please consider to configure memory attribute properly for shared resource.

Is volatile necessary for the resource used in a critical section?

3 Answers3

Linked