The following code implements some lock-free (and atomic-free!) inter-thread communication that requires the usage of the store and load memory barriers but the C++11 release-acquire semantics is not appropriate nor guarantee the correctness. Actually the algorithm exposes a need for a kind of inversion of the release-acquire semantic i.e. to signal that some operation didn't take place rather than it did.
volatile bool valid=true;
volatile uint8_t blob[1024] = {/*some values*/};
void zero_blob() {
valid=false;
STORE_BARRIER;
memset(blob,0,1024);
}
int32_t try_get_sum(size_t index_1, size_t index_2) {
uint8_t res = blob[index_1] + blob[index_2];
LOAD_BARRIER;
return valid ? res : -1;
}
I'm able to make this code correct on all hardware architectures simply using native memory barriers e.g. on Intel there is no need for memory barriers here, on Sparc (RMO) membar #StoreStore and membar #LoadLoad, on PowerPC lwsync for both. So no big deal and the code is a typical example of using store and load barriers. Now, what C++11 construction should I use to make the code correct assuming that I don't want to convert 'blob' to std::atomic
objects as it would make 'blob' a guard object and variable 'valid' a guarded one whereas it's the other way around.
Converting variable 'valid' to a std::atomic
object is OK for me, but there are no barriers to guarantee the correctness. To make it clear, let's consider the following code:
volatile std::atomic<bool> valid{true};
volatile uint8_t blob[1024] = {/*some values*/};
void zero_blob() {
valid.store(false, std::memory_order_release);
memset(blob,0,1024);
}
int32_t try_get_sum(size_t index_1, size_t index_2) {
uint8_t res = blob[index_1] + blob[index_2];
return valid.load(std::memory_order_acquire) ? res : -1;
}
The code is incorrect as the barriers are placed in the wrong places and hence writing to 'blob' can precede writing to 'valid' or/and loading from 'valid' can precede loading from 'blob'. I thought that in order to deal with such constructions C++11 provided std::atomic_thread_fence
and the code should be:
volatile std::atomic<bool> valid{true};
volatile uint8_t blob[1024] = {/*some values*/};
void zero_blob() {
valid.store(false, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
memset(blob,0,1024);
}
int32_t try_get_sum(size_t index_1, size_t index_2) {
uint8_t res = blob[index_1] + blob[index_2];
std::atomic_thread_fence(std::memory_order_acquire);
return valid.load(std::memory_order_relaxed); ? res : -1;
}
Unfortunately C++11 says:
A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.
which clearly states that std::atomic_thread_fence
should be placed in the opposite sides of the operations on the atomic object.
LATER EDIT
Below please find much more usable example:
volatile uint64_t clock=1;
volatile uint8_t blob[1024] = {/*some values*/};
void update_blob(uint8_t vals[1024]) {
clock++;
STORE_BARRIER;
memcpy(blob,vals,1024);
STORE_BARRIER;
clock++;
}
int32_t try_get_sum(size_t index_1, size_t index_2) {
uint64_t snapshot = clock;
if(snapshot & 0x1) {
LOAD_BARRIER;
uint8_t res = blob[index_1] + blob[index_2];
LOAD_BARRIER;
if(snapshot == clock)
return res;
}
return -1;
}