0

In my application, I have a block of shared memory, which one thread periodically writes to, and another thread periodically gets (and then sets to 0)

Thread 1:

@onevent:
__atomic_store(addr, val, __ATOMIC_SEQ_CST);

Thread 2:

while((val = __atomic_exchange_n(addr, 0, __ATOMIC_SEQ_CST)) == 0);
... work on val

I find that occasionally thread 2 spins forever. In addition, placing any kind of debugging statements, say a print statement of addr after the atomic store or after each atomic exchange, and everything works fine (so some kind of race condition).

I'm really stuck (since I tried this in a separate isolated program, and it seems to work fine). Any help would be much appreciated. For reference I am running on a high-core count dual-socket node.

Mihir Shah
  • 39
  • 1
  • 5
  • 1
    @SupportUkraine sorry I'm not understanding how a store between exchange operations can cause problems. Say the value is currently 0. An exchange happens with 0, so nothing changes. Then the store happens, and the value becomes V. Then the next exchange happens, and thread 2 gets V, breaks the loop, and resets the value to 0. All are atomic, so I'm not understanding how something can go wrong here. – Mihir Shah Jun 16 '22 at 04:23
  • The code you've shown is fine (although inefficient, since you don't fall back to a read-only spin with `_mm_pause()`, instead just hammering on the line. See [this answer](https://stackoverflow.com/questions/37241553/locks-around-memory-manipulation-via-inline-assembly/37246263#37246263)). Perhaps your real code has some mistaken assumptions, like that two stores will always result in two wakeups of waiter threads? If the 2nd store comes before an xchg has reset `*addr` to zero, you lost that update. – Peter Cordes Jun 16 '22 at 04:47
  • Since you're doing expensive SEQ_CST stores anyway for some strange reason, on x86 you could equivalently have the stores also be `exchange`, and check if the value was zero (if that's what your program depends on). Maybe then try to exchange it back in, until you see a `0`. For modern x86, compilers will compile `atomic_store(val, SEQ_CST)` to an `xchg` instruction anyway, so it doesn't even cost extra. (But means you can't make it cheaper by using a release store.) And makes it more expensive for other ISAs, especially AArch64 which has very cheap SC stores. – Peter Cordes Jun 16 '22 at 04:51
  • Oh ok, I'll look through my other code again, thank you for this other advice as well! – Mihir Shah Jun 16 '22 at 05:03

0 Answers0