Do I use atomic<> over shared memory correctly here

Question

Suppose we have two processes.

Both use same mmap-ed memory region in order to pass some information, in this case uint16_t.

I read all kind of opinions why std::atomic<> should work, if it uses is_always_lock_free.

So I did this code. Code works as expected, but is this code thread / inter-process safe:

#include <cstdint>
#include <atomic>
#include <iostream>

#include <sys/mman.h>
#include <fcntl.h>

#include <unistd.h> // sleep
#include <time.h>

using I  = uint16_t;
using AI = std::atomic<I>;

static_assert(std::atomic<I>::is_always_lock_free);

constexpr size_t SIZE = sizeof(AI);
constexpr const char *NAME = "nmmm";

constexpr I STOP = 12345;

void error(const char *msg){
    std::cout << msg << '\n';
    exit(10);
}

int main(int argc, const char **argv){
    std::cout << sizeof(I) << ' ' << sizeof(AI) << '\n';

    int fd = shm_open(NAME,  O_RDWR | O_CREAT, 0644);
    if (fd < 0)
        error("shm_open");

    int t = ftruncate(fd, SIZE);
    if (t < 0)
        error("ftruncate");

    void *vmem = mmap(nullptr, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
    if (vmem == MAP_FAILED)
        error("mmap");

    std::cout << "All set up!" << ' ' << vmem << '\n';

    AI *ai = reinterpret_cast<AI *>(vmem);


    if (argc > 1){
        switch(argv[1][0]){
        case 'g':
        case 'G':
            ai->store(0, std::memory_order_relaxed);

            while(true){
                auto x = ai->load(std::memory_order_relaxed);

                std::cout << x << '\n';

                if (x == STOP)
                    break;

                sleep(1);
            }

        case 's':
        case 'S':
            ai->store(STOP, std::memory_order_relaxed);
            break;

        default:
            {
                srand(time(nullptr));

                I const x = rand() & 0xFFFF;

                std::cout << "set val to " << x << '\n';

                ai->store(x , std::memory_order_relaxed);
            }
            break;
        }
    }

    munmap(vmem, SIZE);
//  shm_unlink(NAME);
}

Excuse my mixing of C and C++ like using rand.

What you suppose to do it -

# console 1
./a.out g # start "Getter"

# console 2
./a.out x # set random value
# or
./a.out s # set "Stop" value, so process in console 1 stops.

What I am worrying:

no placement new, just reinterpret cast + initial value set.
not a d-tor either.
why sizeof(uint16_t) is same as sizeof(std::atomic<uint16_t>) - is it always like this if is_always_lock_free ? Is std::atomic just a fancy Assembly instruction?

Update

It appears, std::atomic<uint16_t> is POD. So casting is not UB (undefined behavour)

static_assert(std::is_pod_v<std::atomic<uint16_t> >);

Also it appears std::atomic<uint16_t> operations compiles to single assembly instructions:

https://gcc.godbolt.org/z/d6bK9jfza

`AI *ai = reinterpret_cast(vmem);` leads to undefined beahvior. There is no `AI` object at the location `vmem` so treating it as one is UB. You have to use placement new to create an `AI` in the location `vmem` points to. — NathanOliver, Aug 24 '22 at 15:52
For you last bullet, see: https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic-on-x86 — NathanOliver, Aug 24 '22 at 15:53
If it would be safe in a single process, it'll be safe in shared memory on real implementations, as long as `is_lock_free` is true. No init is needed in practice. Yes, `std::atomic` is just a wrapper for GNU C `__atomic_whatever(uint16_t *, etc.)`, or equivalent on other compilers. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. Look at the asm yourself ([How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116)) for simple functions like `return x.load(relaxed)` — Peter Cordes, Aug 24 '22 at 15:54
@NathanOliver - I know exactly what do you mean. However this seems to be correct way if you do in virtual memory, since uint16_t is "pod" (or supported type of the CPU) — Nick, Aug 24 '22 at 15:57
@PeterCordes - so code is correct? Can you do it as official answer, so others may comment under it? — Nick, Aug 24 '22 at 15:58
`std::atomic` isn't POD, so in theory there's UB. In practice on real implementations it's fine. Your argument would hold if you were using `atomic_ref` on a POD `uint16_t` I think. — Peter Cordes, Aug 24 '22 at 15:58
Thank you @PeterCordes . [This comment](https://stackoverflow.com/questions/73476133/do-i-use-atomic-over-shared-memory-correctly-here#comment129754647_73476133) just helped me out with something that I've been working on. — user4581301, Aug 24 '22 at 16:00
IDK if your algorithm is correct, and I didn't really look at your code. What I said is true in general: *if* it's safe between threads of a single process, it's also safe in shared memory. All mainstream C++ implementations for platforms that can do shared memory have this property, AFAIK. And `atomic` is almost always the same size as `T`, although the standard doesn't require or guarantee that. — Peter Cordes, Aug 24 '22 at 16:02
@user4581301: Which part of it was helpful? I think all this has been discussed before on various SO questions, so this is just a matter of finding duplicates. Or maybe parts of it is just common knowledge / obvious to those of us who understand assembly / CPUs / compilers and how `std::atomic<>` is normally implemented, but hasn't been written up in an SO answer. If you can help identify which part wasn't obvious, I could put something in an answer. — Peter Cordes, Aug 24 '22 at 16:04
@Nick You don't have a `uint16_t`, you have a `std::atomic`. That is not required to be POD and so to avoid UB you have to use placement new to create one. — NathanOliver, Aug 24 '22 at 16:04
@PeterCordes - this is what I said. uint16_t is pod. here is assembly btw, is single instruction. https://gcc.godbolt.org/z/hxW7zaj4r . it turns out std::atomic is also pod. I will update — Nick, Aug 24 '22 at 16:05
@Nick: `uint16_t` is POD, but you're `using AI = std::atomic;` so there is no bare `uint16_t` object in shared memory, except as a member of non-POD `atomic`. So it's technically UB to assume that whatever's in memory there is a valid bit-pattern for `atomic`. It's only in *practice* on real implementations that's it's fine, because the object representation is the same as a plain `uint16_t` on all mainstream C++ implementations, or at worst has some padding. — Peter Cordes, Aug 24 '22 at 16:10
But the C++ standard doesn't guarantee that directly, it just has initialization rules (e.g. that an all-zero-initialized `atomic` has to work, probably for compat with C11 _Atomic that doesn't do non-constant init). That would be rather hard to satisfy if `atomic` had any other members, especially a lock. — Peter Cordes, Aug 24 '22 at 16:11
@PeterCordes sadly this is correct. this is why this can not be use in production, at least not with confidence... — Nick, Aug 24 '22 at 16:17
@PeterCordes It's pretty much always just a matter of finding the duplicate, but now I have a better idea of what and the terminology to look for. The bit about atomic_builtins. I'm vendor locked to GCC anyway and if that changes, I get a compiler error. — user4581301, Aug 24 '22 at 16:19
@PeterCordes problems arise if programs use dufferent runtime implementations. In reality, some interprocess synchronization would be needed if that is possible and no syncronization stuctures in shared memory would be required... just POD data. — Swift - Friday Pie, Aug 24 '22 at 16:20
@Swift-FridayPie - yes, low tech solution, create a file, then rename it. I personally will do that, but is very low tech :) — Nick, Aug 24 '22 at 16:30
@Swift-FridayPie: Not sure what your point is. Why do you say "synchronization structures" wouldn't be required in shared memory? You just said some algorithms might require synchronization. In which case you'd use a memory-order stronger than `relaxed` on a `std::atomic<>` object in shared memory, to create synchronization (a happens-before relationship) between release and acquire operations in separate programs. Making it safe to read non-atomic POD data written by the other program, not just single atomic objects. This works for lock-free atomics on impls where they're address free. — Peter Cordes, Aug 24 '22 at 16:43
Why use atomic here? Wasn't `volatile` designed for such case? — ALX23z, Aug 24 '22 at 16:46
@Swift-FridayPie: I don't think C++ guarantees that `std::shared_mutex` works in shared memory between processes, but the same standard that provides `shm_open` and `mmap` (POSIX) also provides pthreads which may have some guarantees. Or on some C++ implementations, you might get lucky, e.g. [g++ `std::shared_mutex` happens to work on Linux between processes](https://stackoverflow.com/questions/46662935/stdmutex-in-shared-memory-not-working) — Peter Cordes, Aug 24 '22 at 16:47
@ALX23z: Lol, no. It used to be the only option for rolling your own atomics before they became part of the language, which is why it mostly works in practice as similar to `mo_relaxed` atomics, at least on some compilers, but the standards guarantee nothing. Concurrent access to `volatile` has data-race UB, just like non-volatile. [There's no good reason to use `volatile` over `std::atomic`, even when it does work.](https://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading/58535118#58535118) — Peter Cordes, Aug 24 '22 at 16:50
@PeterCordes on Windows mutex by definition shared, but other platforms which do not implement that part of POSIX (also it's usually root-protected part).. with shared memory there is little guarantee — Swift - Friday Pie, Aug 24 '22 at 17:59
@Swift-FridayPie: Little guarantee of what? Of ISO C++ `std::shared_mutex` working? Yeah, that's why I said it might happen to work. On a POSIX system you'd use `PTHREAD_PROCESS_SHARED` with `pthread_mutex` for guaranteed behaviour, as in [c++11 interprocess atomics and mutexes](https://stackoverflow.com/a/19908903) — Peter Cordes, Aug 24 '22 at 18:03
@user4581301 check my answer. Probably will be interesting for you. — Nick, Aug 25 '22 at 09:29

score 1 · Answer 1 · answered Aug 24 '22 at 17:13

What I am worrying:

no placement new, just reinterpret cast + initial value set.

not a d-tor either.

why sizeof(uint16_t) is same as sizeof(std::atomic<uint16_t>) - is it always like this if is_always_lock_free ? Is std::atomic just a fancy Assembly instruction?

These assumtions are likely to be true, but formally these are leading to UB.

To avoid running into UB, use C++20 atomic_ref. Share plain uint16_t and use atomic_ref to it in processes.

With atomic_ref it is still not formally guaranteed to work in an interprocess way (no interprocess in the standard C++ at all), but you will not be running into the mentioned UB, so no concerns about lifetime or incorrect aliasing.

Note that you have to keep is_always_lock_free assertion. Without it, atomic_ref would use a locking mechanism to provide atomic semantic, this mechanism is likely to be private to a program / process.

You can instead placement-`new` a `std::atomic` object, if it's more convenient to have every access to it be atomic automatically, with no way to do non-atomic access to it. (Other than via `memcpy`). You'd have to construct it in exactly one process, not both, if you do it this way; constructing another `std::atomic` object over top of a `std::atomic` object another process was already using in shared memory seems even worse, although in practice it would just be a non-atomic store of a `uint16_t` on normal systems. — Peter Cordes, Aug 24 '22 at 18:17

score 1 · Answer 2 · answered Aug 25 '22 at 09:28

Because C++ atomic are not very "clear" to use, you can use stdatomic.h.

https://en.cppreference.com/w/c/atomic/atomic_load

However I failed to compile example with these on gcc.

This is why I decided to use gcc builtin atomics:

https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

These works on both gcc and clang on Linux. Did not checked on MacOS yes.

Here is the code, almost similar to what I posted before. Once again excuse C style code.

In order code to be similar to what stdatomic.h offers, I did two functions that call the builtin(s):

template<typename T>
void atomic_store(T *a, T val){
    return __atomic_store_n(a, val, __ATOMIC_RELAXED);
}

template<typename T>
void atomic_store(T &a, T val){
    return atomic_store(& a, val);
}

template<typename T>
T atomic_load(T *a){
    return __atomic_load_n(a, __ATOMIC_RELAXED);
}

template<typename T>
T atomic_load(T &a){
    return atomic_load(& a);
}

Those suppose to be good with all integral types such int, short, char, unsigned etc.

Here is the full code in case you decide to "dig" future.

#include <cstdio>

#include <sys/mman.h>
#include <fcntl.h>

#include <unistd.h> // sleep
#include <cstdlib>  // exit
#include <time.h>

using MyAtomicInt = int;

constexpr size_t SIZE = sizeof(MyAtomicInt);
constexpr const char *NAME = "nmmm";

constexpr MyAtomicInt STOP = 12345;

void error(const char *msg){
    printf("%s\n", msg);
    exit(10);
}



template<typename T>
void atomic_store(T *a, T val){
    return __atomic_store_n(a, val, __ATOMIC_RELAXED);
}

template<typename T>
void atomic_store(T &a, T val){
    return atomic_store(& a, val);
}

template<typename T>
T atomic_load(T *a){
    return __atomic_load_n(a, __ATOMIC_RELAXED);
}

template<typename T>
T atomic_load(T &a){
    return atomic_load(& a);
}



int main(int argc, const char **argv){
    int fd = shm_open(NAME,  O_RDWR | O_CREAT, 0644);
    if (fd < 0)
        error("shm_open");

    int t = ftruncate(fd, SIZE);
    if (t < 0)
        error("ftruncate");

    void *vmem = mmap(nullptr, SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
    if (vmem == MAP_FAILED)
        error("mmap");

    printf("All set up! %p\n", vmem);

    MyAtomicInt *ai = reinterpret_cast<MyAtomicInt *>(vmem);


    if (argc > 1){
        switch(argv[1][0]){
        case 'g':
        case 'G':
            atomic_store(ai, 0);

            while(true){
                auto x = atomic_load(ai);

                printf("%d\n", x);

                if (x == STOP)
                    break;

                sleep(1);
            }

        case 's':
        case 'S':
            atomic_store(ai, STOP);
            break;

        default:
            {
                srand(time(nullptr));

                MyAtomicInt const x = rand() & 0xFFFF;

                printf("set val to %d\n", x);

                atomic_store(ai, x);
            }
            break;
        }
    }

    munmap(vmem, SIZE);
//  shm_unlink(NAME);
}

Do I use atomic<> over shared memory correctly here

2 Answers2