Is it safe to non-atomically update a value selected by relaxed atomic operation without any synchronization?

Question

Example pseudocode:

struct Values{
   AtomicInt counter;
   Int needed_counter;
   // We update this.
   SomeType unsynchronized_value;
}

fn update_array(Values[] values){
   for(Values& val : values) {
      // Let's assume that overflow is impossible.
      if (val.counter.fetch_add(1, relaxed) != val.needed_counter) { // CONDITION
        continue;
      }
      // This is noop for CPU but prevents compiler optimizations
      // from moving operations around it.
      compiler_fence(acquire_release);

      UpdateUnsynchonisedValue(&mut val.unsynchronized_value);
   }
   
   // Flush all changes made in current thread at once.
   memory_fence(release);
}

On the one hand, since CONDITION can be false only for one thread so it seems that no synchronization is needed.

On the other hand, modern CPUs do out of order and speculative execution so I wonder if it can cause data race by starting executing UpdateUnsynchonisedValue function before checking CONDITION.

Since I put compiler_fence, compiler should not reorder instructions in UpdateUnsynchonisedValue before checking CONDITION so question only about CPUs behaviour and memory model of languages like C++ or Rust.

Jan Schultke · Accepted Answer · 2023-07-13T18:48:00.227

This code is safe in C++ without the additional compiler_fence.

fetch_add(1, relaxed) will yield a unique index for each thread, because read-modify-write (RMW) operations are always required to use the latest value of an atomic, even with std::memory_order::relaxed.

UpdateUnsynchonisedValue(&mut val.unsynchronized_value); cannot be reordered before val.counter.fetch_add(...), because control flow might not reach this function call depending on the fetched value. If this reordering was possible, you wouldn't be able to write the same single-threaded code either. It would be insane.

In general, code cannot be reordered before a condition, if the code depends on the condition.

Speculative execution can result in code being executed before a branch has been taken, but this cannot change the effect of the program's instructions. If a speculative memory write was able to cause a data race, then it would be the responsibility of the compiler to insert a fence which prevents this. In practice, this isn't necessary, because speculate writes are buffered, and will be discarded if a branch isn't taken, before anything is written to global memory.

Remember: the code you write is targeting the C++ abstract machine, and on this machine, val.unsynchronized_value won't be written to if the if-statement isn't entered. Compilers must emit code that behaves as if this was the case, regardless of implementation details like speculative execution.

What about speculative execution on modern CPUs? Single threaded code have less concerns because there cannot be data race on speculative execution because it speculative execution would not access data used by other threads. — Angelicos Phosphoros, Jul 13 '23 at 13:48
@AngelicosPhosphoros the memory and execution model of C++ guarantees that this code cannot produce a data race. Either the compiler, or the architecture must ensure that speculative memory writes cannot produce additional data races. It's not the responsibility of the developer to do that. — Jan Schultke, Jul 13 '23 at 13:54
@AngelicosPhosphoros: CPU hardware handles speculative stores with the store buffer: they can only commit from the store buffer to L1d cache (and become globally visible) after the store instruction has retired from out-of-order exec (thus is known to be non-speculative). We call this a "graduated" store, and the store buffer commits them to L1d as fast as it can, depending on cache misses and memory-ordering rules (e.g. x86 requires that stores commit in program order.) See [Can a speculatively executed CPU branch contain opcodes that access RAM?](https://stackoverflow.com/q/64141366) — Peter Cordes, Jul 13 '23 at 18:04

Is it safe to non-atomically update a value selected by relaxed atomic operation without any synchronization?

1 Answers1