Assume that I have two processes that both share a memory block using shm_open
and mmap
and there exists a shared synchronization primitive - let's say a semaphore - that ensures exclusive access to the memory. I.e. no race conditions.
My understanding is that the pointer returned from mmap
must still be marked as volatile to prevent cached reads.
Now, how does one write e.g. a std::uint64_t
into any aligned position in the memory?
Naturally, I would simply use std::memcpy
but it does not work with pointers to volatile memory.
First attempt
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization, for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(nullptr);
// Store byte-by-byte
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
for(std::size_t i=0;i<sizeof(value);++i)
ptr[i]=src[i];
I strongly believe this solution is correct but even with -O3
, there are 8 1-byte transfers. That is really not optimal.
Second Attempt
Since I know no one is going to change the memory while I have it locked, maybe the volatile is unnecessary after all?
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
//Obscure enough?
auto* real_ptr = reinterpret_cast<unsigned char*>(reinterpret_cast<std::uintptr_t>(ptr));
std::memcpy(real_ptr,src,sizeof(value));
But this does not seem to work, compiler sees through the cast and does nothing. Clang generates ud2
instruction, not sure why, is there UB in my code? Apart from value
initialization.
Third attempt
This one comes from this answer. But I think it does break strict aliasing rule, does it not?
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
volatile std::uint64_t* dest = reinterpret_cast<volatile std::uint64_t*>(ptr);
*dest=value;
Gcc actually does what I want - a simple one instruction to copy 64bit value. But it is useless if it is UB.
One way how I could go about fixing it is to really create std::uint64_t
object at that place. But, apparently placement new does not work with volatile
pointers either.
Questions
- So, is there a better (safe) way than byte-by-byte copy?
- I would also like to copy even larger blocks of raw bytes. Can this be done better than by individual bytes?
- Is there any possibility to force
memcpy
do the right thing? - Do I needlessly worry about the performance and should just go with the loop?
- Any examples(mostly C) do not use
volatile
at all, should I do that too? Ismmap
ed pointer treated differently already? How?
Thanks for any suggestions.
EDIT:
Both processes run on the same system. Also please assume the values can be copied byte-by-byte, not talking about complex virtual classes storing pointers to somewhere. All Integers and no floats would be just fine.