Is C++11 atomic usable with mmap?

Question

I want to add network control of a handful of parameters used by a service (daemon) running on a Linux embedded system. There's no need for procedure calls, each parameter can be polled in a very natural way. Shared memory seems a nice way to keep networking code out of the daemon, and limit shared access to a carefully controlled set of variables.

Since I don't want partial writes to cause visibility of values never written, I was thinking of using std::atomic<bool> and std::atomic<int>. However, I'm worried that std::atomic<T> might be implemented in a way that only works with C++11 threads and not with multiple processes (potentially, not even with OS threads). Specifically, if the implementation uses any data structures stored outside the shared memory block, in a multi-process scenario this would fail.

I do see some requirements which suggest to be that std::atomic won't hold an embedded lock object or pointer to additional data:

The atomic integral specializations and the specialization atomic<bool> shall have standard layout. They shall each have a trivial default constructor and a trivial destructor. They shall each support aggregate initialization syntax.

There shall be pointer partial specializations of the atomic class template. These specializations shall have standard layout, trivial default constructors, and trivial destructors. They shall each support aggregate initialization syntax.

Trivial default construction and destruction seems to me to exclude associated per-object data, whether stored inside the object, via a pointer member variable, or via an external mapping.

However, I see nothing that excludes an implementation from using a single global mutex / critical section (or even a global collection, as long as the collection elements aren't associated with individual atomic objects -- something along the lines of a cache association scheme could be used to reduce false conflicts). Obviously, access from multiple processes would fail on an implementation using a global mutex, because the users would have independent mutexes and not actually synchronize with each other.

Is an implementation of atomic<T> allowed to do things that are incompatible with inter-process shared memory, or are there other rules that make it safe?

I just noticed that trivial default construction leaves the object in a not-ready state, and a call to atomic_init is required. And the Standard mentions initialization of locks. If these are stored inside the object (and dynamic memory allocation seems impossible, since the destructor remains trivial) then they would be shared between processes. But I'm still concerned about the possibility of a global mutex.

In any case, guaranteeing a single call to atomic_init for each variable in a shared region seems difficult... so I suppose I'll have to steer away from the C++11 atomic types.

As an addendum, [people have been recommending use of atomic operations with shared memory](http://stackoverflow.com/questions/4668592/ipc-via-mmaped-file-should-atomics-and-or-volatile-be-used), although it isn't clear if they meant to include or exclude `std::atomic` or whether other APIs are guaranteed to work. — Ben Voigt, Aug 19 '13 at 19:50
I would expect that a reasonable system would make not use external data structures for `atomic` variables; it would defeat the point of atomics in the first place... — user541686, Aug 19 '13 at 20:28
@Mehrdad: I don't see how taking a global lock would defeat the purpose any more than taking a local lock, and the Standard specifically talks about implementations which do the latter. — Ben Voigt, Aug 19 '13 at 20:45
I meant performance-wise. The whole point of an atomic is to be fast, right? Otherwise you might as well have used a lock... — user541686, Aug 19 '13 at 21:09
@Mehrdad Speed has very little to do with why one would use atomic. The point of atomic is consistency. — Andre Kostur, Aug 19 '13 at 21:36
@AndreKostur: Wouldn't locks achieve the same purpose? Why would you use an atomic instead of a lock then? — user541686, Aug 19 '13 at 21:36
@Mehrdad Assuming you're using some sort of interprocess locking mechanism, yes. But, I would suspect that part of the reason the OP wished to use `std::atomic` is that it provides a nice interface where you don't need to remember to acquire and release locks. It will do whatever is necessary to make the variable access atomic, within that well-formed C++ program. But since the standard doesn't talk about inter-process issues, the synchronization mechanisms used by `std::atomic` may not work across processes. — Andre Kostur, Aug 19 '13 at 21:40
@Mehrdad: Anyone writing code to work on multiple platforms is very glad there's a standard library type that maps to fast lockless compare-and-swap instructions on platforms that have them, and to locks otherwise. Besides, writing code to explicitly use locks would make the code longer and hurt readability. — Ben Voigt, Aug 19 '13 at 21:41

score 25 · Accepted Answer · answered Nov 12 '13 at 18:41

I'm two months late, but I'm having the exact same problem right now and I think I've found some sort of an answer. The short version is that it should work, but I'm not sure if I'd depend on it.

Here's what I found:

The C++11 standard defines a new memory model, but it has no notion of OS-level "process", so anything multiprocessing-related is non-standard.
However, section 29.4 "Lock-free property" of the standard (or at least the draft I have, N3337) ends with this note:

[ Note: Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes. — end note ]

This sounds very promising. :)
That note appears to come from N2427, which is even more explicit:

To facilitate inter-process communication via shared memory, it is our intent that lock-free operations also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation shall not depend on any per-process state. While such a definition is beyond the scope of the standard, a clear statement of our intent will enable a portable expression of class of a programs already extant.

So it appears that yes, all lock-free operations are supposed to work in this exact scenario.
Now, operations on std::atomic<type> are atomic but they may or may not be lock-free for particular type, depending on capabilities of the platform. And We can check any variable x by calling x.is_lock_free().
So why did I write that I would not depend on this? I can't find any kind of documentation for gcc, llvm or anyone else that's explicit about this.

On any "normal" architecture, like x86, ARM, PowerPC, MIPS, etc., gcc and llvm (and other sane compilers) implement lock-free atomics using asm instructions that are address-free, and only care about the physical address. i.e. they Just Work even when two separate processes have the same physical page mapped to different virtual addresses with `mmap`. — Peter Cordes, Feb 06 '19 at 06:01
Atomic read-modify-write operations stop other observers from reading or writing the same cache line in the middle of a RMW by making sure this core is the only core with a valid copy of the line for the duration of the RMW operation. All modern systems use some variant of MESI for cache coherency, where Modified state means no other core can have it any state except Invalid. On x86, the core takes a "cache lock" (not respond to requests to share the cache line until the RMW is done). On most others with LL/SC, the SC store-conditional aborts the "transaction" if the line didn't stay in M. — Peter Cordes, Feb 06 '19 at 06:04
See also [Can num++ be atomic for 'int num'?](//stackoverflow.com/q/39393850) for more details about how this works. But the key is that multi-core systems already maintain coherent caches, so atomic RMW builds on top of that. — Peter Cordes, Feb 06 '19 at 06:06

score 2 · Answer 2 · answered Aug 19 '13 at 20:23

2

Until C++11, the standard did not specify how multiple threads share memory, so we wrote programs with multiple threads that relied on implementation-specific behavior. The standard still doesn't specify how processes with shared memory - or if you prefer, threads that only partially share memory - interact. Whatever you end up doing you will be relying on implementation-specific guarantees.

That said, I think an implementation that supports process-shared memory will try to make its thread synchronization mechanisms like atomics usable in process-shared memory for process synchronization. At the very least, I think it would be hard to devise a lock-free implementation of a std::atomic specialization that does not work correctly cross-process.

answered Aug 19 '13 at 20:23

Casey

41,449
7
95
125

I agree, but the Standard explicitly does not require `std::atomic` to be lock-free. – Ben Voigt Aug 19 '13 at 20:24
@BenVoigt True - but I would consider it poor QoI to the point of being a bug if a C++ implementation didn't support lock-free 64-bit atomics on X64, for example. There's a lot of gray area in the standard specification for undefined behavior, but in many cases the behavior is realistically constrained by our expectations of what is allowable for an implementation of reasonable quality. If I dereference a NULL pointer, demons won't fly out of my nose - I'll get a SIGSEGV because my implementation isn't a piece of junk. – Casey Aug 19 '13 at 20:34
Not every implementation with shared memory will support `std::atomic` in shared memory - but you are not likely to use one that doesn't. You've already made the decision to rely on non-standard behavior by using process-shared memory in the first place, further requiring `std::atomic` to work in that shared memory won't realistically restrict your potential portability by much if at all. – Casey Aug 19 '13 at 20:35
Good point... are you aware of any implementations that document such additional guarantees (non-portably of course)? I'm especially interested in G++ 4.7.x in a Linux/ARM environment, but any example of what such a guarantee looks like would be awesome. – Ben Voigt Aug 19 '13 at 20:44
@BenVoigt Sadly, no I am not. There is a bit of discoverability in the form of the [`ATOMIC_XXX_LOCK_FREE` macros](http://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free) in that a type `XXX` is guaranteed to always be lockfree if `ATOMIC_XXX_LOCK_FREE` is defined to 2. Since I can't imagine a lockfree implementation that doesn't use an atomic CPU instruction to directly update memory, I would call that "good enough" for a basic portability requirement. Beyond that, you're probably going to have to examine implementations directly. – Casey Aug 19 '13 at 20:51
4

@Casey dereference a NULL pointer can still cause demon to fly form your nose: http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf case of linux kernel. – Yankes Aug 19 '13 at 20:54
1

@Yankes I find no occurrences of the words "demon" or "nasal" in the cited document; your argument is invalid. Kidding aside, very interesting paper. – Casey Aug 19 '13 at 21:24
I too enjoyed the MIT research paper greatly. – Ben Voigt Aug 19 '13 at 21:42
That paper is outstanding. – WhozCraig Aug 20 '13 at 01:43
1

@Yankes Was that supposed to be "Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior"? URL are not stable in the long term :( ... https://pdos.csail.mit.edu/papers/stack:sosp13.pdf – curiousguy May 08 '19 at 04:08
@curiousguy Yes, this was this paper – Yankes May 11 '19 at 14:32

Is C++11 atomic usable with mmap?

2 Answers2

Linked