Does boost atomic reference counting example contain a bug?

Question

I'm referring to this example. The authors use memory_order_release to decrement the counter. And they even state in the discussion section that using memory_order_acq_rel instead would be excessive. But wouldn't the following scenario in theory lead to that x is never deleted?

we have two threads on different CPUs
each of them owns an instance of a shared pointer, both pointers share ownership over the same control block, no other pointers referring that block exist
each thread has the counter in its cache and the counter is 2 for both of them
the first thread destroys its pointer, the counter in this thread now is 1
the second thread destroys its pointer, however cache invalidation signal from the first thread may still be queued, so it decrements the value from its own cache and gets 1
both threads didn't delete x, but there are no shared pointers sharing our control block => memory leak

The code sample from the link:

#include <boost/intrusive_ptr.hpp>
#include <boost/atomic.hpp>

class X {
public:
  typedef boost::intrusive_ptr<X> pointer;
  X() : refcount_(0) {}

private:
  mutable boost::atomic<int> refcount_;
  friend void intrusive_ptr_add_ref(const X * x)
  {
    x->refcount_.fetch_add(1, boost::memory_order_relaxed);
  }
  friend void intrusive_ptr_release(const X * x)
  {
    if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
      boost::atomic_thread_fence(boost::memory_order_acquire);
      delete x;
    }
  }
};

The quote from discussion section:

It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.

Quote code that you're talking about right in your question, as well as linking to it for attribution. Stack Overflow questions must always be self-contained, and still meaningful even if the links die. — Peter Cordes, May 05 '22 at 17:04

user17732522 · Accepted Answer · 2022-05-05T17:04:42.763

3

All modifications of a single atomic variable happen in a global modification order. It is not possible for two threads to disagree about this order.

The fetch_sub operation is an atomic read-modify-write operation and is required to always read the value of the atomic variable immediately before the modification from the same operation in the modification order.

So it is not possible for the second thread to read 2 when the first thread's fetch_sub was first in the modification order. The implementation must assure that such a cache incoherence cannot happen, if necessary with the help of locks if the hardware doesn't support this atomic access natively. (That is what the is_lock_free and is_always_lock_free members of the atomic are there to check for.)

This is all independent of the memory orders of the operations. These matter only for access to other memory locations than the atomic variable itself.

edited May 05 '22 at 17:04

answered May 05 '22 at 16:56

user17732522

53,019
2
56
105

1

@Leonid No, that is independent of the memory order. See https://www.eel.is/c++draft/basic#intro.races-4 and https://www.eel.is/c++draft/atomics#order-10, neither of which mention memory order. – user17732522 May 05 '22 at 16:59
sorry, removed the comment when saw your update to the post. will have a look – Leonid May 05 '22 at 17:00
but here https://en.cppreference.com/w/cpp/atomic/memory_order for memory_order_release "All writes in the current thread are visible in other threads that acquire the same atomic variable" - doesn't this mean that visibility is not guaranteed until the variable is acquired? – Leonid May 05 '22 at 17:04
@Leonid It guarantees that for _all_ writes to _all_ memory locations (not only the atomic). But for the atomic itself there are stronger guarantees. – user17732522 May 05 '22 at 17:05
1

@Leonid: That's talking about writes to *other* memory locations. `memory_order` controls ordering of atomic operations wrt. other operations on other objects by the same thread. There are separate rules about ordering for a single object. (Atomic RMWs always see the "latest value", which BTW [confuses some people into thinking that pure loads can see stale values indefinitely](https://stackoverflow.com/questions/71718224/is-a-memory-barrier-required-to-read-a-value-that-is-atomically-modified).) – Peter Cordes May 05 '22 at 17:06
2

@Leonid: Also [relaxed ordering and inter thread visibility](https://stackoverflow.com/q/56912520) mentions the fact that atomic RMWs are inherently serializing, so 100 calls to `.fetch_add(1, relaxed)` will always increment by 100 total, no lost updates. No two executions of fetch_add can see the same starting value (for a variable that's monotonically increasing.) – Peter Cordes May 05 '22 at 17:13

Does boost atomic reference counting example contain a bug?

1 Answers1