Is memset operation "inter thread happens before"?

Question

C++ memory operations, using atomics, targeted at a variable x.

std::atomic<int> x;
char cache[1024];

Thread 1:

memset(cache, 0, 1024);
x.store(20, std::memory_order::release);

Thread 2:

int z = x.load(std::memory_order::acquire);
char c = cache[20];

In this scenario, where the cache itself is not atomic, can we ensure that when thread 2 reads from cache, it will obtain the value written by thread 1?

If `cache` is shared between 2 threads, it's orthogonal to `std::atomic x`. You cannot use atomic as a synchronization mechanism for others. It's atomic itself but not related to anything else. — Louis Go, Aug 26 '23 at 07:27
if (and only if) thread 2 reads `z == 20` (with the prior condition being that it was initialized to something else before), then yes, it will be guaranteed that the memset is visible. That's the point of an acquire-release pair. To make non-atomics "visible" to other threads — Homer512, Aug 26 '23 at 07:47
[This blog post](https://preshing.com/20130823/the-synchronizes-with-relation/) by Preshing describes pretty much what you do — Homer512, Aug 26 '23 at 08:01

Peter Cordes · Answer 1 · 2023-08-26T18:51:33.143

You need a spin-wait loop or something in case z != 20. (Hopefully involving C++20 x.wait(z, acquire) to avoid spinning too long for x to change from its old value (cppreference).) The load might not see the value from that store, if that load happens too early.

You only get a happens-before between the reader and the thread that stored the value you loaded¹, not some thread that hasn't yet done its store.

But yes, if z == 20, then release/acquire semantics are sufficient for the reader to safely read non-atomic data the writer wrote. (As long as other it or other writer threads haven't been making further modifications.)

See

Footnote 1: In practice on almost all real hardware, an acquire load also sync-with all previous writers of x which used release or stronger. I think probably PowerPC can violate that assumption, though, unless the writes were atomic RMWs; those form a release-sequence. But on paper you only sync with the last pure store and a release sequence that follows it, not everything previous in the modification-order.

Sorry for my ugly example code, of couse you're right. – pippo Aug 28 '23 at 02:16 — pippo, Aug 28 '23 at 02:16

Boris Radonic · Answer 2 · 2023-08-28T08:44:27.587

-3

Yes, The atomic operation on x with using memory orders release and acquire ensures proper synchronization and ordering of non-atomic operations in 'cache'

#include <iostream>
#include <thread>
#include <atomic>
#include <cstring>

std::atomic<int> x;
char cache[1024];

void thread1Func()
{
    memset(cache, 0, 1024);
    x.store(20, std::memory_order_release);
}

void thread2Func()
{
    int z = x.load(std::memory_order_acquire);
    char c = cache[z];

    std::cout << "Value loaded by thread2: " << z << ", cache[z]: " << static_cast<int>(c) << std::endl;
}

TEST(TestCaseName, TestName)
{
  std::thread thread1(thread1Func);

  std::thread thread2(thread2Func);

  thread1.join();
  thread2.join();

  
}

Value loaded by thread2: 20, cache[z]: 0

edited Aug 28 '23 at 08:44

answered Aug 26 '23 at 10:34

Boris Radonic

13
3

What if we got `212` scenario? – Louis Go Aug 26 '23 at 14:24
The 212 scenario isn't possible due to the acquire-release guarantee. T2 will see the memset done by T1. May be is good for others to explain what is 212 scenario... – Boris Radonic Aug 26 '23 at 14:55
If a thread T1 reads a value while another thread T2 changes the value. The first thread T1 continues its operation on the old value. – Boris Radonic Aug 26 '23 at 14:57
2

You can call it ABA (T2 = A and T2 =B). That said, if T2 got `z==0`, the `cache` isn't `memset`ed. Without checking `z`, it is impossible to tell per my understanding. – Louis Go Aug 26 '23 at 15:24
Tray to reproduce that.... Sorry but I am right. – Boris Radonic Aug 27 '23 at 20:55
1

Then why don't you post your tests? That should be easy to create a test on godbolt. – Louis Go Aug 28 '23 at 03:18
People must first convince themselves that they are not aware of their unawareness. Everything else is a pure waste of time and energy. – Boris Radonic Aug 28 '23 at 08:29
1

`char cache[1024];` is already statically initialized to `0` so your `memset` isn't actually changing its value. Still, running it repeatedly (`while ./a.out; do :; done` in bash), I do see some executions where `Value loaded by thread2:` is `0`, so the load did *not* see the store. (I also change it to memset the value `1` (https://godbolt.org/z/j9jETq3nM), so I could tell whether we saw the value or not.) All executions where `z == 20` have `cache[z]: 1`, but there are execution where `z == 0` and `cache[z]:` is still `0`. (And some where `z==0` and `cache[z] == 1`). – Peter Cordes Aug 28 '23 at 19:32
1

So there's no guarantee that `cache[z]` has been written, unless we see `z == 20`. Your answer incorrectly implies that the load will always see `z == 20`, and your test is only testing that, not the pointed-to value which doesn't change. (I was testing on a quad-core Skylake running Arch GNU/Linux, kernel 6.4.9.) – Peter Cordes Aug 28 '23 at 19:34
1

(Note that your `cache[z]` instead of the OP's `cache[20]` introduces dependency-ordering [in practice](https://stackoverflow.com/a/59832012); even compiling for ARM or something other than DEC Alpha, a `relaxed` load would work the same as `acquire` here. ISO C++ doesn't guarantee it without `std::memory_order_consume`, but that's currently deprecated and gets promoted to `acquire`. But still, the asm will use the load result as part of the address for the next load when you use `z` instead of `20`, removing one possible source of memory reordering. You'd still need `release` stores.) – Peter Cordes Aug 28 '23 at 19:41

Is memset operation "inter thread happens before"?

2 Answers2