7

how to use std::atomic<>

In the question above, obviously we can just use std::mutex to keep thread safety. I want to know when to use which one.

classs A
{
    std::atomic<int> x;

public:
    A()
    {
        x=0;
    }

    void Add()
    {
        x++;
    }

    void Sub()
    {
        x--;
    }     
};

and

std::mutex mtx;
classs A
{
    int x;

public:
    A()
    {
        x=0;
    }

    void Add()
    {
        std::lock_guard<std::mutex> guard(mtx);
        x++;
    }

    void Sub()
    {
        std::lock_guard<std::mutex> guard(mtx);
        x--;
    }     
};
Community
  • 1
  • 1
Yves
  • 11,597
  • 17
  • 83
  • 180
  • `x` is an instance variable. You can get fine-grained locking by making the mutex a class-member instead of having one big lock for all threads modifying all instances of class A. (That of course increases the size of each A object.) – Peter Cordes Sep 21 '16 at 19:36
  • 1
    Don't forget that even a read-only accessor function also needs to take the lock, at least in theory to avoid C++ UB. (This is a huge advantage for std::atomic: read-only access is much cheaper). – Peter Cordes Sep 21 '16 at 19:36
  • 1
    @PeterCordes You could use both: a mutex for accessing all components of an object in a well defined state and atomic subparts for each property of the object whose value make sense alone, so accessing a single component doesn't go through the mutex (but updates and accessing all parts do). – curiousguy Nov 02 '19 at 01:34

1 Answers1

9

As a rule of thumb, use std::atomic for POD types where the underlying specialisation will be able to use something clever like a bus lock on the CPU (which will give you no more overhead than a pipeline dump), or even a spin lock. On some systems, an int might already be atomic, so std::atomic<int> will specialise out effectively to an int.

Use std::mutex for non-POD types, bearing in mind that acquiring a mutex is at least an order of magnitude slower than a bus lock.

If you're still unsure, measure the performance.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 2
    `int` loads and `int` stores are usually atomic (e.g. they are on x86), but [`my_int++` is never atomic on multi-core systems](http://stackoverflow.com/questions/39393850/can-num-be-atomic-for-int-num/39396999#39396999). I'd agree with your overall point that std::atomic primitive types are probably useful, and anything else is likely to just do less efficient locking behind the scenes. – Peter Cordes Sep 21 '16 at 19:29
  • 2
    `std::atomic` *may* be useful for objects that fit in 16 bytes, but only if you know *exactly* what you're doing, and are targeting a platform that you know has something like x86-64 `lock cmpxchg16b`, and you build with `-mcx16` (since cmpxchg16b is an extension, unfortunately, not part of baseline x86-64 because it was missing from the first gen AMD64 CPUs.) See [my answer here](http://stackoverflow.com/questions/38984153/implement-aba-counter-with-c11-cas) about compare-and-swap on an object the size of two pointers. – Peter Cordes Sep 21 '16 at 19:31
  • Just to be clear, even if `int` is narrow enough that the compiler doesn't have to do any extra work to get atomicity for `atomic`, **you still need `atomic` for thread-safety**. My previous comment may have given a false impression. You can use `std::atomic` with `std::memory_order_relaxed` if you don't want any extra ordering, just forcing access to cache-coherent memory (rather than holding a variable's value in a register): see [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/a/387478) – Peter Cordes Jun 14 '22 at 08:59
  • A *bus* lock would be very expensive, blocking memory access from all cores even to unrelated cache lines. But you only get that from misaligned atomic RMWs on x86. Compilers don't do that, they use `alignas(sizeof(T))` for `atomic`, so CPUs can just use a cache lock. And so pure-load and pure-store can be atomic as well, not just atomic RMWs. A cache lock doesn't even block out-of-order exec of ALU instructions on that core, although on x86 it is a full barrier, blocking later loads until the store buffer is drained. – Peter Cordes Jun 14 '22 at 09:03