0

I'm using C (more exactly: C11 with gcc) for developing some low-latency software for x64 (more exactly: Intel CPUs only). I don't care about portability or any other architecture in general.

I know that volatile is in general not the first choice for data synchronization. However, those three facts seem to be true:

  • volatile enforces writing data to memory and as well reading from memory (=so it's not allowed to "cache" the value in a register and it also implies that some optimizations cannot be done by the compiler)
  • volatile accesses must not be reordered by the compiler
  • 4 byte (or even 8 byte) values are always atomically written on x64 (same is true for reading)

Now I have this code:

typedef struct {
    double some_data;
    double more_data;
    char even_more_data[123];
} Data;

static volatile Data data;
static volatile int data_ready = 0;

void thread1()
{
    while (true) {
        while (data_ready) ;

        const Data x = f(...); // prepare some data
        data         = x;      // write it
        data_ready   = 1;      // signal that the data is ready  
    }
}

void thread2()
{
    while (true) {
        while (!data_ready) ;

        const Data x = data; // copy data
        data_ready   = 0;    // signal that data is copied
        g(x);                // process data
    }
}

thread1 is a producer of Data and thread2 is a consumer of Data. Note that is used those facts:

  • data is written before data_ready. So when thread2 reads data_ready and it's 1, then we know that data is also available (guarantee for the ordering of volatile)
  • thread2 first reads and stores data and then sets data_ready to 0, so thread1 can again produce some data and store it.
  • data_ready cannot have a weird state, because reading and writing an int (with 4 bytes) is automatically atomic on x64

This way was the fastest option I've finally had. Note that both threads are pinned to cores (which are isolated). They are busy polling on data_ready, because it's important for me to process the data as fast as possible.

Atomics and mutexes were slower, so I used this implementation.

My question is finally if it's possible that this does not behave as I expect it? I cannot find anything wrong in the shown logic, but I know that volatile is a tricky beast.

Thanks a lot

Kevin Meier
  • 2,339
  • 3
  • 25
  • 52
  • 1
    "Atomics and mutexes were slower, so I used this implementation." - Relaxed atomic operations are as fast as volatile accesses. Not sure why do you want to avoid true C11 concepts (atomics) and program with volatile, which threading correctness depends on the compiler. – Tsyvarev Jul 14 '23 at 19:58
  • 1
    `atomic_store_explicit(&foo, newval, memory_order_relaxed)` compiles to the same asm as a `volatile` assignment except for rare cases of missed optimizations (e.g. with `_Atomic double` last I checked); there's literally no benefit to `volatile` for normal use-cases, and you can easily get acquire / release sync for free on x86 with more readable source than with `volatile`. Same as with C++11 - [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) – Peter Cordes Jul 14 '23 at 20:04
  • But does the threading correctness in this example even depend on the compiler? – Kevin Meier Jul 14 '23 at 20:04
  • 2
    Yes, any sane compiler for x86 should make working asm for this. (I don't think StoreLoad reordering is a problem here; the effectively-acquire spin-wait (inefficient without `_mm_pause()`) ends up waiting for the store to be visible to the other thread since that thread won't toggle it back until after it becomes globally visible. It would probably be a lot more efficient to use a lock-free queue with a few entries, less time lost to inter-thread latency.) – Peter Cordes Jul 14 '23 at 20:14
  • 3
    But anyway, if you used `_Atomic` release/acquire you'd only need `data_ready` to be atomic, not the struct. With `volatile` that is necessary to prevent compile-time reordering, so you might be costing performance in copying a large struct, depending on how `volatile` struct-assignment optimizes vs. plain. – Peter Cordes Jul 14 '23 at 20:14

0 Answers0