Are there compiler optimization issues with sharing variables between threads?

Question

Consider the following example. The goal is to use two threads, one to "compute" a value and one that consumes and uses the computed value (I've tried to simplify this). The computing thread signals to the other thread that the value has been computed and is ready by using a condition variable, after which the waiting thread consumes the value.

// Hopefully this is free from errors, if not, please point them out so I can fix
// them and we can focus on the main question
#include <pthread.h>
#include <stdio.h>

// The data passed to each thread. These could just be global variables.
typedef struct ThreadData
{
  pthread_mutex_t mutex;
  pthread_cond_t cond;
  int spaceHit;
} ThreadData;

// The "computing" thread... just asks you to press space and checks if you did or not
void* getValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);

  printf("Please hit space and press enter\n");
  data->spaceHit = getchar() == ' ';
  pthread_cond_signal(&data->cond);

  pthread_mutex_unlock(&data->mutex);

  return NULL;
}

// The "consuming" thread... waits for the value to be set and then uses it
void* watchValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);
  if (!data->spaceHit)
      pthread_cond_wait(&data->cond, &data->mutex);
  pthread_mutex_unlock(&data->mutex);

  if (data->spaceHit)
      printf("You hit space!\n");
  else
    printf("You did NOT hit space!\n");

  return NULL;
}

int main()
{
  // Boring main function. Just initializes things and starts the two threads.
  pthread_t threads[2];
  pthread_attr_t attr;
  ThreadData data;
  data.spaceHit = 0;

  pthread_mutex_init(&data.mutex, NULL);
  pthread_cond_init(&data.cond, NULL);

  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  pthread_create(&threads[0], &attr, watchValue, &data);
  pthread_create(&threads[1], &attr, getValue, &data);

  pthread_join(threads[0], NULL);
  pthread_join(threads[1], NULL);

  pthread_attr_destroy(&attr);
  pthread_mutex_destroy(&data.mutex);
  pthread_cond_destroy(&data.cond);

  return 0;
}

My main question has to do with potential optimizations done by the compiler. Is the compiler allowed to do tricky optimizations and "optimize" the program flow such that the following happens:

void* watchValue(void* td)
{
  ThreadData* data = td;

  pthread_mutex_lock(&data->mutex);
  if (!data->spaceHit) // Here, it might remember the result of data->spaceHit
      pthread_cond_wait(&data->cond, &data->mutex);
  pthread_mutex_unlock(&data->mutex);

  if (remember the old result of data->spaceHit without re-getting it)
      printf("You hit space!\n");
  else
    printf("You did NOT hit space!\n");
  // The above if statement now may not execute correctly because it didn't
  // re-get the value of data->spaceHit, but "remembered" the old result
  // from the if statement a few lines above

  return NULL;
}

I'm a bit paranoid that the compiler's static analysis might determine that data->spaceHit does not change between the two if statements, and thus justify using the old value of data->spaceHit instead of re-getting the new value. I don't know enough about threading and compiler optimizations to know if this code is safe or not. Is it?

Note: I've written this in C, and tagged this as C and C++. I'm using this in a C++ library, but since I'm using C threading APIs (pthreads and Win32 threads) and have the option to embed C in this portion of the C++ library, I've tagged this as both C and C++.

caf · Answer 1 · 2013-05-29T05:42:24.620

9

No, the compiler is not allowed to cache the value of data->spaceHit across the calls to pthread_cond_wait() or pthread_mutex_unlock(). These are both specifically called out as "functions [which] synchronize memory with respect to other threads", which must necessarily act as compiler barriers.

For a compiler to be part of a conforming pthreads implementation, it must not perform that optimisation in the case you've given.

edited May 29 '13 at 05:42

answered May 29 '13 at 03:39

caf

233,326
40
323
462

The compiler will be not cache values across calls to any other syscalls or unknown lib functions. – curiousguy Aug 25 '15 at 23:54

Kaz · Accepted Answer · 2013-05-29T15:42:11.847

Generally speaking, there are not only compiler optimization issues with sharing data between threads, but hardware optimization issues when those threads are on different processors which can execute instructions out of order.

However, the pthread_mutex_lock and pthread_mutex_unlock functions must defeat not only compiler caching optimizations, but also any hardware reordering optimizations also. If a thread A prepares some shared data and then "publishes" it by performing an unlock, this must appear consistent to other threads. For example, it cannot appear on another processor that the lock is released, but the updates of the shared variables did not yet complete. So the functions have to execute any necessary memory barriers. All of that would be for naught if the compiler could move data accesses around the calls to the functions, or cache things at the register level such that coherency is broken.

So the code you have is safe from that perspective. However, it has other issues. The pthread_cond_wait function should always be called in a loop which re-tests the variable, because of the possibility of spurious wakeup for any reason.

The signaling of a condition is stateless, so the waiting thread can block forever. Just because you call pthread_cond_signal unconditionally in the getValue input thread doesn't mean that watchValue will fall through the wait. It's possible that getValue executes first, and spaceHit is no set. Then watchValue enters into the mutex, sees that spaceHit is false and executes a wait which may be indefinite. (The only thing that will save it is a spurious wakeup, ironically, because there is no loop.)

Basically the logic you seem to be looking for is a simple semaphore:

// Consumer:
wait(data_ready_semaphore);
use(data);

// Producer:
data = produce();
signal(data_ready_semaphore);

In this style of interaction, we do not need a mutex, which is hinted at by the unprotected use of data->spaceHit in your watchValue. More concretely, with POSIX semaphore syntax:

// "watchValue" consumer
sem_wait(&ready_semaphore);
if (data->spaceHit)
  printf("You hit space!\n");
else
  printf("You did NOT hit space!\n");

// "getValue" producer
data->spaceHit = getchar() == ' ';
sem_post(&ready_semaphore);

Perhaps the real code that you simplified to the example can just use semaphores.

P.S. also pthread_cond_signal need not be inside the mutex. It potentially calls into the operating system, so a mutex-protected region that only needs to be a handful of machine instructions long just to protect the shared variables can blow up to hundreds of machine cycles because it contains the signaling call.

er, that's not how SMP works. The underlying hardware will ensure that all caches that are caching the same memory will be invalidated if any of them gets updated by a thread. Intel's QPI has to work very hard to bring that about. Without that it would be a NUMA system, in which case you wouldn't be able to use shared memory in the first place. The whole point of volatile and memory fences (compiler and hardware) is to ensure that the compiler doesn't store the variable in a register and that the CPU executes instructions in the right order. After that SMP sorts out the rest. — bazza, May 29 '13 at 04:01
@bazza You're right; cache coherency is commonly taken care of, so why mix it into the answer. Why a lock might appear to be unlocked before the shared data is settled would primarily be because of out of order execution. Just the first paragraph hinted at this, originally; I fixed it. — Kaz, May 29 '13 at 05:04
It's generally a good idea to keep `pthread_cond_signal` inside the mutex. It keeps you from racing with the wait. — Hasturkun, May 29 '13 at 12:43
Why is accessing the `data->spaceHit` variable outside of the lock an issue? It's already received the signal at that point so it's sure it's not going to be modified by the other thread (because it's already been modified and set). Or am I not seeing something? — Cornstalks, May 29 '13 at 14:59
@CornStalks It isn't in fact an issue because `watchValue` will not reach that statement unless `data->spaceHit` is true, which means that the only other thread which exists has finished its one and only interaction with the data. But there is another problem. I have updated the answer. — Kaz, May 29 '13 at 15:45
@Hasturkun Placing `pthread_cond_signal` in the mutex is not required to solve any race condition problem related to the wait, because `pthread_cond_wait` releases the mutex and waits in such a way that the wakeup cannot be lost. To avoid racing with the wait (causing a lost wakeup), the signaling thread must lock the mutex before signaling, but the order between unlocking and signaling doesn't matter. — Kaz, May 29 '13 at 15:51
I'd suggest looking at http://stackoverflow.com/a/4544494 and http://stackoverflow.com/a/2298471 Though for the most part, so long as one doesn't forget to set the condition inside the mutex, things are probably good. — Hasturkun, May 29 '13 at 16:10

chux - Reinstate Monica · Answer 3 · 2013-05-29T12:35:07.477

-1

(Later Edit: It looks like subsequent answers have provided a better answer to this query. I'll leave this answer here as a reference of an approach that does not answer the question as well. Advise should you recommend a different approach.)

Type ThreadData itself is not volatile.

The instantiation of it as 'data' in main() is volatile. The pointers 'data' in getValue() and watchValueValue() point to volatile versions of a 'ThreadData' type also.

Although I like the first answer for its tightness, rewriting

ThreadData data;  // main()
ThreadData* data; // getValue(), watchValueValue()

to

volatile ThreadData data;  // main()
volatile ThreadData* data; // getValue(), watchValueValue()
                           // Pointer `data` is not volatile, what it points to is volatile.

may be better. It will insure any access to members of ThreadData are always re-read and not optimized. Should you add additional fields to ThreadData, there are likewise protected.

edited May 29 '13 at 12:35

answered May 29 '13 at 02:14

chux - Reinstate Monica

143,097
13
135
256

Either I'm not understanding you correctly, or your mistaken. The instantiation of `data` in `main()` is not `volatile`. The `td` and `data` pointers in the thread functions are not marked `volatile` either. Can you clarify? – Cornstalks May 29 '13 at 02:22
I am recommending that data (in main be labeld as voltile. This will prevent the compiler from using any optimizations on 'data' for it can change at any time. – chux - Reinstate Monica May 29 '13 at 02:26
1

Only `ThreadData::spaceHit` needs to be marked volatile (directly in the `struct` definition). Sure marking `data` volatile makes all its members volatile too but it's better to just mark `ThreadData::spaceHit` volatile because then you don't risk forgetting it (otherwise you'd have to mark every `ThreadData` instance as volatile and make sure all your pointers keep the qualifier, which is quite error-prone). – syam May 29 '13 at 02:27
@syam You bring up a good point about the need to mark every occurrence as `volatile`. But if we had another problem, where we needed a `const` pointer, we would be obliged to use `const` as we passed it around. – chux - Reinstate Monica May 29 '13 at 02:34
1

@chux: Good point too. However, I'd argue that constness is often a matter of instances or aliases (some need it, some don't) while in this case volatileness is a requirement due to the multithreaded usage of that particular data member. – syam May 29 '13 at 02:37
1

At least if compiling C concurrent access to a (shared) variable in a threaded environment needs to be protected by a mutex (lock), **there is no need to declare it `volatile`**, the mutex and the compiler shall do everything to guarantee the variable's value is consistent when being read and stays consistent when being written. – alk May 29 '13 at 06:08
As indicated by other answers, using `volatile` is not **required**. As indicated by the defined intended meaning of C/C++ programming language `volatile` specifier, using `volatile` by itself is not **sufficient** because hardware optimizations still could affect the access in multicore machines (and there perhaps might still be issues resulting from instruction reordering optimizations by the compiler). – FooF Jul 31 '13 at 03:52
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html or http://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming or https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt for details. – FooF Jul 31 '13 at 03:52

Are there compiler optimization issues with sharing variables between threads?

3 Answers3