2

On the Internet, there can be found many debates about the use of volatile keyword in parallel programming, sometimes with contradictory argumentation.

One of the more trustworthy discussion of this topic seems to be this article by Arch Robison. The example he is using is the task of passing a value from one thread to another:

Thread 1. computes a matrix product and gives it to Thread 2, which does something other with it. The matrix is variable M and the flag is a volatile pointer R.

  1. Thread 1 multiplies computes a matrix product M and atomically sets R to point to M.
  2. Thread 2 waits until R!=NULL and then uses M as a factor to compute another matrix product.

In other words, M is a message and R is a ready flag.

The author is claiming, that while declaring R as a volatile will solve the issue with propagating the change from Thread 1 to Thread 2, it makes no guarantees about what the value of M will be when this happens. And the assignments to R and M can be reordered. So we need to make both M and R volatile or use some synchronization mechanism in some library like pthreads.

My question is, how to do the following in C

1) How to share a single flag between two threads - How to atomically assign to it, make sure the other thread will see the change and test for the change in the other thread. Is the use of volatile legitimate in this case? Or can some library provide a conceptually better or faster way, probably involving memory barriers?

2) How to do the Robison's example right, so how to send the matrix M from one thread to the other and do it safely (and preferably portably with pthreads)

user7610
  • 25,267
  • 15
  • 124
  • 150

4 Answers4

1

volatile gives you zero ordering guarantees. At compile time (and run-time on a weakly-ordered ISA), it's similar to _Atomic with memory_order_relaxed. (Assuming the variable is small enough and aligned enough to be naturally atomic.

Of course with a bool only 1 byte of it ever changes, so seeing anything other than 0 or 1 is impossible.

At runtime on strongly-ordered x86, asm loads/stores have acq/rel ordering, so if volatile happens not to reorder then it's "safe" for that build.

When to use volatile with multi threading? (never: use atomic with memory_order_relaxed if that's what you want.)


For a "data ready" flag, you actually need release / acquire semantics. https://preshing.com/20120913/acquire-and-release-semantics/

How to share a single flag between two threads - How to atomically assign to it, make sure the other thread will see the change and test for the change in the other thread.

#include <stdatomic.h>
// shared:
_Atomic bool data_ready = false;
float shared_matrix[N][N];

In producer:

   write_matrix( &shared_matrix );  // loop that fills a buffer
   atomic_store_explicit(&data_ready, true, memory_order_release);
   // data_ready = true  but with only release, not seq_cst for efficiency

In the consumer:

#include <immintrin.h>   // ifdef __x86__

void consumer() {
   while(!atomic_load_explicit(&data_ready, memory_order_acquire)) {
       _mm_pause();   // for x86 spin loops
   }
   // now safe to read matrix
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
1

"volatile" is a hint for the compiler not to optimize the memory access, i.e., do not assume that a value in memory is unchanged since the last (local) write. Without this hint, a compiler could assume that a value of a register, where the variable is copied from, is still valid. Thus, while it is rather unlikely that a matrix is kept within a register, in general both variables should be volatile, or more precisly, volatile for the receiver.

In real life multithreading, one would rather use a semaphore or something like for the signaling, avoiding busy waiting on receiver.

Matthias
  • 8,018
  • 2
  • 27
  • 53
  • In response to the last sentence in your first paragrapn, Andy Robinson says both variables should be volatile for *both* sides, because volatile would also prevent the reordering of the assignments to the flag and to the matrix. But I assume you left that reordering issue out for simplicity. – user7610 Feb 28 '12 at 12:54
  • A matrix, especially in a multiThreaded app where it is expected to be communicated between threads, would probably be dynamically-allocated and so accessed via its pointer. Pointers are usually register-sized :( – Martin James Feb 28 '12 at 13:01
  • You need a volatile at the sender side only, if the sender reuses the matrix or the pointer. The reordering is an independent matter. As Necrolis statet, in C11 you can use _Atomic. – Matthias Feb 28 '12 at 13:11
  • ad Matthias Werner: According the quiotation from the C99 standard (5.1.2.3.1 first bullet point) "At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred." It is quioted in the Andy Rubins article too.So that seems to imply that the compiler needs to insert a memory fence after each expression involving volatile variable. So volatile should take care for the reordering too. – user7610 Mar 01 '12 at 09:16
  • @user7610: Sequence points refer to one program resp. thread. Thus, the order some other thread are not defined, except by its own sequence points. In theory, memory accesses could pass by each other *seen from another thread*. In reality, it really occurs sometimes, depending on the hardware. – Matthias Mar 01 '12 at 15:40
1

Under architectures like x86, a properly aligned (and sized) variable like a pointer will by default be read from and written to atomically, but what needs to happen is a serialization of memory read/writes to prevent reordering in the CPU pipeline (via use of an explicit memory fence or bus locking operation) as well as the use of volatile to prevent the compiler reordering the code it generates.

The easiest way to do this is to use CAS. most CAS intrinsics provide a full memory barrier at compiler and CPU memory bus level. under MSVC, you can use the Interlock* functions, BTS, BTR, Inc, Dec, Exchange and Add would all work for a flag, for GCC you'd use the __sync_* based variants.

For more portable options you could use a pthread_mutex or pthread_cond. if you can use C11 you can also look into the _Atomic keyword.

Necrolis
  • 25,836
  • 3
  • 63
  • 101
0

The 'classic' way is for Thread 1 to push the pointer to the dynamically-allocated matrix onto a producer-consumer queue upon which Thread 2 is waiting. Once pushed, Thread 1 can allocate another M and start working on it, if it so wishes.

Fiddling around with volatile flags etc. as an optimization may be premature if the overall performance is dominated by operations on large matrices.

Martin James
  • 24,453
  • 3
  • 36
  • 60