1

Objective: I have a chunk of memory shared between several x86/x64 processes/threads. Inside it I have QWORD (64 bit) variable aligned on 8-byte boundary. That variable is accessed by one writer and several readers simultaneously, mix of x86 and x64 readers/writers is possible. Let's say that variable is declared/named as QWORD qwSharedVar, and compiler is MSVC with /volatile:ms behaviour.

Suppositions:

1. If writer process is x64, it writes a new value in a such way:

*reinterpret_cast< volatile QWORD* >( &qwSharedVar ) = qwNewValue;

If writer process is x86, it writes a new value in a such way:

_InterlockedCompareExchange64( reinterpret_cast< volatile __int64* >( &qwSharedVar ), qwNewValue, qwSharedVar );

If reader process is x64, it reads a shared value in a such way:

QWORD qwValueNow = *reinterpret_cast< volatile QWORD* >( &qwSharedVar );

If reader process is x86, it reads a shared value in a such way:

QWORD qwValueNow = _InterlockedCompareExchange64( reinterpret_cast< volatile __int64* >( &qwSharedVar ), 0, 0 );

If reader process is x86 or x64, and it wants to read only lower DWORD of shared value, then:

DWORD dwValueNow = *reinterpret_cast< volatile DWORD* >( &qwSharedVar );

Are those suppositions legal? I'm especially worried about point 5, where mix of accesses through "lock cmpxchg8b" and "mov dword ptr" is possible.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
dj_alek
  • 31
  • 4
  • 3
    The x86 memory model is well-defined for all of these and works the way you'd expect / hope, including a mix of qword load, stores, or CAS, combined with dword loads of either half. Related: [How can I implement ABA counter with c++11 CAS?](https://stackoverflow.com/q/38984153)/ is an equivalent problem of mixing qword access with `lock cmpxchg16b` in 64-bit mode. https://stackoverflow.com/tags/x86/info has some links to x86 formal memory model stuff, but basically a wide store or RMW can be reloaded by any narrower load just like if it was the same width reload. – Peter Cordes Sep 10 '20 at 17:41
  • 2
    For #2, you need to check that the cmpxchg succeeded; otherwise the value wasn’t written. – prl Sep 10 '20 at 18:19
  • 1
    Also note that you don't need to do any of this manually; use `std::atomic` (with std::memory_order_relaxed if you want cheap stores). You might need to check that it's lock_free for all the builds you care about; GCC can do lock-free 64-bit atomics in 32-bit mode (using SSE2 movq load/store). I don't know for sure that MSVC does that; it might want to fall back to locking for 64-bit atomics in 32-bit mode, which would be ABI-incompatible with lock-free access. – Peter Cordes Sep 10 '20 at 19:00
  • >> For #2, you need to check that the cmpxchg succeeded; otherwise the value wasn’t written. Yes, it's clear. I wrote simplified code so as not to distract from the main question. Furthermore, one-writer scheme doesn't require checks or till-succeeded cycles. Thanks! – dj_alek Sep 11 '20 at 09:43
  • Yes, I realized the “one writer” part after I had written the comment. – prl Sep 11 '20 at 10:19

0 Answers0