2

I have a size_t variable which is updated by a std::thread and read by another std::thread.

I know that I can mutex protect the reads and writes. But, would it be the same or would it be beneficial if I make the size_t as std::atomic<size_t>?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
AdeleGoldberg
  • 1,289
  • 3
  • 12
  • 28
  • 1
    IMHO atomics are nice, but you have to be careful using them the right way. E.g. Increment/Decrement must be implemented the right way. Take a look at that: https://stackoverflow.com/questions/15056237/which-is-more-efficient-basic-mutex-lock-or-atomic-integer – haronaut Jan 18 '20 at 18:28
  • 1
    and that: https://stackoverflow.com/questions/31978324/what-exactly-is-stdatomic – haronaut Jan 18 '20 at 18:35

1 Answers1

3

Yes, it is worth it. In fact it is mandatory to use std::atomic or synchronize access to a non-atomic if multiple threads use the same variable and at least one is writing to the variable. Not following this rule is data-race undefined behavior.

Depending on your use of the std::size_t the compiler can assume that non-atomic and otherwise non-synchronized variables will not change from other threads and optimize the code accordingly. This can cause Bad Things™ to happen.

My usual example for this is a loop where a non-atomic boolean is used:

// make keepRunning an std::atomic<bool> to avoid endless loop
bool keepRunning {true};
unsigned number = 0;

void stop()
{
    keepRunning = false;
}

void loop()
{
    while(keepRunning) {
        number += 1;
    }
}

When compiling this code with optimizations enabled, GCC and Clang will both only check keepRunning once and then start an endless loop. See https://godbolt.org/z/GYMiLE for the generated assembler output.

i.e. they optimize it into if (keepRunning) infinite_loop;, hoisting the load out of the loop. Because it's non-atomic, they're allowed to assume no other thread can be writing it. See Multithreading program stuck in optimized mode but runs normally in -O0 for a more detailed look at the same problem.

Note that this example only shows the error if the loop body is sufficiently simple. However the undefined behaviour is still present and should be avoided by using std::atomic or synchronization.


In this case you can use std::atomic<bool> with std::memory_order_relaxed because you don't need any synchronization or ordering wrt. other operations in either the writing or reading thread. That will give you atomicity (no tearing) and the assumption that the value can change asynchronously, without making the compiler use any asm barrier instructions to create more ordering wrt. other operations.

So it's possible and safe to use atomics without any synchronization, and without even creating synchronization between the writer and reader the way seq_cst or acquire/release loads and stores do. You can use this synchronization to safely share a non-atomic variable or array, e.g. with atomic<int*> buffer that the reader reads when the pointer is non-NULL.

But if only the atomic variable itself is shared, you can just have readers read the current value, not caring about synchronization. You may want to read it into a local temporary if you don't need to re-read every iteration of a short loop, only once per function call.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Finn
  • 1,018
  • 6
  • 6
  • 2
    It's not mandatory to *synchronize*, you're allowed to use `memory_order_relaxed` so the reader simply reads the value that's currently there. And that doesn't require extra asm instructions on any architecture; the extra instructions are only to make the thread wait before/after reading to give ordering wrt. other loads or stores in the same thread. All ISAs only run std::thread across cores that are cache-coherent so if you don't need any ordering, just atomicity, you just need normal load/store instructions (to a small-enough aligned object). – Peter Cordes Jan 18 '20 at 20:22
  • What do you mean by "atomic memory access"? What's non atomic access at the asm level? – curiousguy Jan 18 '20 at 21:02
  • 2
    @curiousguy: e.g. you could have an 8-byte struct that's only 4 byte aligned. It can be (and often would be) accessed in one instruction on x86-64, but if that spans a cache-line boundary it won't be atomic even on Intel CPUs. And on AMD it's only guaranteed atomic if it doesn't cross an 8-byte boundary. – Peter Cordes Jan 18 '20 at 23:11