Could the following code written for a Reentrant Lock be susceptible to a stale data error and provide locks to multiple threads?

Question

I recently came across the following code while learning about Reentrant Locks in Lock-Free Concurrency:

class ReentrantLock32
 {
  std::atomic<std::size_t> m_atomic;
  std::int32_t m_refCount;

public:
  ReentrantLock32() : m_atomic(0), m_refCount(0) {}

  void Acquire()
   {
    std::hash<std::thread::id> hasher;
    std::size_t tid = hasher(std::this_thread::get_id());

    if (m_atomic.load(std::memory_order_relaxed) != tid)
     {
       std::size_t unlockValue = 0;
       while (!m_atomic.compare_exchange_weak(
        unlockValue,
        tid,
        std::memory_order_relaxed,
        std::memory_order_relaxed))
       {
        unlockValue = 0;
        PAUSE();
       }
      }
      ++m_refCount;
      std::atomic_thread_fence(std::memory_order_acquire);
     }

  void Release() {
   std::atomic_thread_fence(std:memory_order_release);
   std::hash<std::thread::id> hasher;
   std::size_t tid = hasher(std::this_thread::get_id());
   std::size_t actual = m_atomic.load(std::memory_order_relaxed);
   assert(actual == tid);

   --m_refCount;
   if (m_refCount == 0)
    {
     m_atomic.store(0,std::memory_order_relaxed);
    }
 }
//...
}

However, it appears that there is a chance of stale data leading to multiple threads acquiring the lock, especially when thread contention is high.

!m_atomic.compare_exchange_weak(
        unlockValue,
        tid,
        std::memory_order_relaxed,
        std::memory_order_relaxed)

If two competing threads from different cores attempt to call compare_exchange_weak at the same time, isn't there a chance that the cache coherency protocol for the CPU could fail to invalidate the L1-cache before both threads acquire the lock?

The whole point of `std::atomic` is to avoid "stale reads". The implementation must ensure that all operations on `m_atomic` are performed in some total order, and each operation in that order observes the side effects of the previous one. — Igor Tandetnik, Jan 11 '23 at 06:01
This same code was part of a previous question, [Could the following code written for a Reentrant Lock be susceptible to an instruction reordering error?](https://stackoverflow.com/q/75066624) — Peter Cordes, Jan 11 '23 at 06:27
No, the CAS can't succeed in both threads at once, that's the whole point of being an atomic RMW. The whole load+compare+store is done as a single atomic transaction. However the CPU implements that in terms of coherency, it has to work. (In practice, cache coherency protocols work by establishing that one core has exclusive ownership of a cache line (MESI E or M state), and is thus allowed to modify it. All other copies are Invalid at that point, so other cores can't even read it. See also [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850) — Peter Cordes, Jan 11 '23 at 06:30
Yes Peter Cordes, that was my understanding. However, I read that lock-free linked lists can suffer from stale reads if they employ similar techniques (memory operations with weak memory ordering and memory barriers instead of memory operations with strong memory ordering). The reason given was that an ICB dealing with high thread contention might not be able to relay the Interrupt before the erroneous memory operation, so I'm trying to determine why that's not the case here. — Josh Hardman, Jan 11 '23 at 06:38

score 1 · Accepted Answer · answered Jan 21 '23 at 11:08

If two competing threads from different cores attempt to call compare_exchange_weak at the same time, isn't there a chance that the cache coherency protocol for the CPU could fail to invalidate the L1-cache before both threads acquire the lock?

In short, no.

Compare-exchange (CAS) is a read-modify-write (RMW) operation. (Technically in C++ it's an RMW if it succeeds, but just a load if it fails). RMW operations virtually guarantee that staleness will not occur. You mentioned the coherency protocol...RMW operations require exclusive access to the memory location. This will invalidate it in other cores' caches. If the core failed to invalidate other cores when getting exclusive mode, that would be a bug, and I think we'd all be in big trouble.

x86 gives lock cmpxchg, which clearly exclusive access (it's what lock does).

Arm essentially does what is known as a load-locked/store-conditional (LL/SC). See this example Arm7 implementation and documentation.

Cmpxchg Relaxed (32 bit):   _loop: ldrex roldval, [rptr]; mov rres, 0; teq roldval, rold; strexeq rres, rnewval, [rptr]; teq rres, 0; bne _loop

Basically it loads the memory location, but it "keeps an eye on it." It checks the value and, if the condition holds, attempts to store. However, if another core had written to it between the load and the store, the store fails. This example has a loop; it is equivalent to compare_exchange_strong. Without the loop it is like compare_exchange_weak.

However, it appears that there is a chance of stale data leading to multiple threads acquiring the lock, especially when thread contention is high.

C++ guarantees cache coherency with respect to a single object for atomics. This means that stale reads are not allowed. (Stale means the old value in cache never gets updated.) A store to an atomic must eventually be visible to all other cores. However, a core can still read an old value, if it reads at the perfectly wrong time. An old read, like a stale read, could similarly lead to a race condition if not for the fact that compare_exchange_weak is an atomic operation.

High contention is an undesirable scenario, since it can cause various overheads to cascade into a major traffic jam.

Could the following code written for a Reentrant Lock be susceptible to a stale data error and provide locks to multiple threads?

1 Answers1

Linked