4

Assume that I have two processes that both share a memory block using shm_open and mmap and there exists a shared synchronization primitive - let's say a semaphore - that ensures exclusive access to the memory. I.e. no race conditions.

My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.

Now, how does one write e.g. a std::uint64_t into any aligned position in the memory?

Naturally, I would simply use std::memcpy but it does not work with pointers to volatile memory.

First attempt

// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;

// Value to store, initialize "randomly" to prevent compiler
// optimization, for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(nullptr);

// Store byte-by-byte
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
for(std::size_t i=0;i<sizeof(value);++i)
    ptr[i]=src[i];

Godbolt.

I strongly believe this solution is correct but even with -O3, there are 8 1-byte transfers. That is really not optimal.

Second Attempt

Since I know no one is going to change the memory while I have it locked, maybe the volatile is unnecessary after all?

// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;

// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);

//Obscure enough?
auto* real_ptr = reinterpret_cast<unsigned char*>(reinterpret_cast<std::uintptr_t>(ptr));

std::memcpy(real_ptr,src,sizeof(value));

Godbolt.

But this does not seem to work, compiler sees through the cast and does nothing. Clang generates ud2 instruction, not sure why, is there UB in my code? Apart from value initialization.

Third attempt

This one comes from this answer. But I think it does break strict aliasing rule, does it not?

// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;

// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);

volatile std::uint64_t* dest = reinterpret_cast<volatile std::uint64_t*>(ptr);
*dest=value;

Godbolt.

Gcc actually does what I want - a simple one instruction to copy 64bit value. But it is useless if it is UB.

One way how I could go about fixing it is to really create std::uint64_t object at that place. But, apparently placement new does not work with volatile pointers either.

Questions

  • So, is there a better (safe) way than byte-by-byte copy?
  • I would also like to copy even larger blocks of raw bytes. Can this be done better than by individual bytes?
  • Is there any possibility to force memcpy do the right thing?
  • Do I needlessly worry about the performance and should just go with the loop?
  • Any examples(mostly C) do not use volatile at all, should I do that too? Is mmaped pointer treated differently already? How?

Thanks for any suggestions.

EDIT:

Both processes run on the same system. Also please assume the values can be copied byte-by-byte, not talking about complex virtual classes storing pointers to somewhere. All Integers and no floats would be just fine.

Quimby
  • 17,735
  • 4
  • 35
  • 55
  • Are the multiple bytes numbers? Remember, ordering is important for multi-byte numbers. Search the internet for "Endianess". – Thomas Matthews Nov 19 '20 at 17:55
  • @ThomasMatthews It is between two processes on the same system, I kinda hope the endianess is the same. – Quimby Nov 19 '20 at 17:56
  • 1
    BTW, there is no guarantee that `memcpy` will perform byte-by-byte transfers. It could be optimized to use registers for copying (4 bytes at a time) or using block transfer instructions (if supported by the processor). To guarantee transfer unit sizes, you'll have to write your own code. (Been there, done that, with embedded systems) – Thomas Matthews Nov 19 '20 at 17:59
  • @ThomasMatthews Thank you. That is exactly the reason why I would prefer a solution in which I can use it because it will do the optimal thing. But I guess that is also a reason why it does not support `volatile`. – Quimby Nov 19 '20 at 18:02
  • Why do you need volatile? Use it when it really maps to device registers. Also why do you need the form of char pointer? – Louis Go Nov 19 '20 at 18:07
  • @LouisGo I use char pointer since it can alias other types. I think I need volatile since the mapped memory can change without C++ compiler seeing any writes, thus perhaps optimizing out reads incorrectly. Am I incorrect? – Quimby Nov 19 '20 at 18:09
  • This [post](https://stackoverflow.com/q/48803929/4123703) might help. Semaphore is required for locking. – Louis Go Nov 19 '20 at 18:14
  • 2
    @Quimby You are incorrect. If you were correct, you would still be screwed because the CPU and other platform hardware can also optimize out reads, so stopping the compiler from doing so would be insufficient. The `volatile` keyword has no defined cross-platform semantics for threads. – David Schwartz Nov 19 '20 at 18:15
  • Oh you may update your post with why you need volatile and `char*`. It would help others to know your intent ( and correct it ). – Louis Go Nov 19 '20 at 18:15
  • 1
    https://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming – n. m. could be an AI Nov 19 '20 at 18:36

2 Answers2

5

My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.

Your understanding is wrong. Don't use volatile for controlling memory visibility - that isn't what it is for. It will either be unnecessarily expensive, or insufficiently strict, or both.

Consider, for example, the GCC documentation on volatile, which says:

Accesses to non-volatile objects are not ordered with respect to volatile accesses. You cannot use a volatile object as a memory barrier to order a sequence of writes to non-volatile memory

If you just want to avoid tearing, cacheing, and reordering - use <atomic> instead. For example, if you have an existing shared uint64_t (and it is correctly aligned), just access it via a std::atomic_ref<uint64_t>. You can use acquire, release, or CAS directly with this.

If you need normal synchronization, then your existing semaphore will be fine. As below, it already supplies whatever fences are necessary, and prevents reordering across the wait/post calls. It doesn't prevent reordering or other optimizations between them, but that's generally fine.


As for

Any examples(mostly C) do not use volatile at all, should I do that too? Is mmaped pointer treated differently already? How?

the answer is that whatever synchronization is used is required to also apply appropriate fences.

POSIX lists these functions as "synchronizing memory", which means they must both emit any required memory fences, and prevent inappropriate compiler reordering. So, for example, your implementation must avoid moving memory accesses across pthread_mutex_*lock() or sem_wait()/sem_post() calls in order to be POSIX-compliant, even where it would otherwise be legal C or C++.

When you use C++'s built-in thread or atomic support, the correct semantics are part of the language standard instead of a platform extension (but shared memory isn't).

Useless
  • 64,155
  • 6
  • 88
  • 132
  • Thank you for answering. I use `volatile` to avoid optimizing R/W is that incorrect? Is C++ compiler aware that mapped memory can change without seeing any writes to it? That would be ideal I guess. I thought about using `` but I did not find anything w.r.t. inter-process communication instead of inter-thread. – Quimby Nov 19 '20 at 18:08
  • Inter-thread communication _is_ effectively inter-process communication, unless you're using single-threaded co-operative multi-tasking. – Useless Nov 19 '20 at 18:10
  • As for the edit, I use POSIX semaphores using `sem_open`, I did not find anything in the documentation about memory barriers in them – Quimby Nov 19 '20 at 18:13
  • In general - using `volatile` for _anything_ apart from memory-mapped I/O and communicating with signal handlers, is very likely to be wrong. All [these](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12) functions are defined as "synchronizing memory". That means both memory fences at runtime and compiler fences to prevent reordering at compile time. – Useless Nov 19 '20 at 18:17
  • @LouisGo Thank you very much. @ Useless Thank you, that clear it up a lot. – Quimby Nov 19 '20 at 18:19
  • My answer still feels a bit woolly because the question covers so much. `memcpy` is fine. Direct pointer deref is also fine if properly aligned. Just don't mix accesses to the same memory via different (non-char) pointers. – Useless Nov 19 '20 at 18:35
  • I know, sorry about that. Both answer are really good, I will accept your since it offers the most "options". – Quimby Nov 20 '20 at 07:16
3

Assume that I have two processes that both share a memory block using shm_open and mmap and there exists a shared synchronization primitive - let's say a semaphore - that ensures exclusive access to the memory. I.e. no race conditions.

You need more than just exclusive access to memory. You need to synchronize memory. Every semaphore I've ever seen already does that. If yours doesn't, it's the wrong synchronization primitive. Switch to a different one.

My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.

Well volatile doesn't prevent cached reads, but almost all semaphores, mutexes, and other synchronization primitives do act as if they prevented cached reads and writes across them. Otherwise, they would be nearly impossible to use.

What semaphore are you using? If it doesn't synchronize memory, it's the wrong tool for the job.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • I use `sem_open` - a POSIX semaphore, that is good, right? So, if every access is protected by the semaphore, I can disregard the volatile keyword and use `memcpy`? – Quimby Nov 19 '20 at 18:15
  • That's correct. Search [this page](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11) for `sem_wait` and `sem_post`. You'll see "The following functions synchronize memory with respect to other threads:" and then a list that includes those two functions. – David Schwartz Nov 19 '20 at 19:09