I am curious about whether volatile
is necessary for the resources used in a critical section. Consider I have two threads executed on two CPUs and they are competing on a shared resource. I know I need to a locking mechanism to make sure only one thread is performing operations on that shared resource.
Making sure that only one thread accesses a shared resource at a time is only part of what a locking mechanism adequate for the purpose will do. Among other things, such a mechanism will also ensure that all writes to shared objects performed by thread Ti before it releases lock L are visible to all other threads Tj after they subsequently acquire lock L. And that in terms of the C semantics of the program, notwithstanding any questions of compiler optimization, register usage, CPU instruction reordering, or similar.
When such a locking mechanism is used, volatile
does not provide any additional benefit for making threads' writes to shared objects be visible to each other. When such a locking mechanism is not used, volatile
does not provide a complete substitute.
C's built-in (since C11) mutexes provide a suitable locking mechanism, at least when using C's built-in threads. So do pthreads mutexes, Sys V and POSIX semaphores, and various other, similar synchronization objects available in various environments, each with respect to corresponding multithreading systems. These semantics are pretty consistent across C-like multithreading implementations, extending at least as far as Java. The semantic requirements for C's built-in multithreading are described in section 5.1.2.4 of the current (C17) language spec.
volatile
is for indicating that an object might be accessed outside the scope of the C semantics of the program. That may happen to produce properties that interact with multithreaded execution in a way that is taken to be desirable, but that is not the purpose or intended use of volatile
. If it were, or if volatile
were sufficient for such purposes, then we would not also need _Atomic
objects and operations.
The previous remarks focus on language-level semantics, and that is sufficient to answer the question. However, inasmuch as the question asks specifically about accessing variables' values from registers, I observe that compilers don't actually have to do anything much multithreading-specific in that area as long as acquiring and releasing locks requires calling functions.
In particular, if an execution E of function f
writes to an object o that is visible to other functions or other executions of f
, then the C implementation must ensure that that write is actually performed on memory before E evaluates any subsequent function call (such as is needed to release a lock). This is necessary because because the value written must be visible to the execution of the called function, regardless of any other threads.
Similarly, if E uses the value of o after return from a function call (such as is needed to acquire a lock) then it must load that value from memory to ensure that it sees the effect of any write that the function may have performed.
The only thing special to multithreading in this regard is that the implementation must ensure that interprocedural analysis optimizations or similar do not subvert the needed memory reads and writes around the lock and unlock functions. In practice, this rarely requires special attention.