Boost provides a sample atomically reference counted shared pointer
Here is the relevant code snippet and the explanation for the various orderings used:
class X {
public:
typedef boost::intrusive_ptr<X> pointer;
X() : refcount_(0) {}
private:
mutable boost::atomic<int> refcount_;
friend void intrusive_ptr_add_ref(const X * x)
{
x->refcount_.fetch_add(1, boost::memory_order_relaxed);
}
friend void intrusive_ptr_release(const X * x)
{
if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
boost::atomic_thread_fence(boost::memory_order_acquire);
delete x;
}
}
};
Increasing the reference counter can always be done with memory_order_relaxed: New references to an object can only be formed from an existing reference, and passing an existing reference from one thread to another must already provide any required synchronization.
It is important to enforce any possible access to the object in one thread (through an existing reference) to happen before deleting the object in a different thread. This is achieved by a "release" operation after dropping a reference (any access to the object through this reference must obviously happened before), and an "acquire" operation before deleting the object.
It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.
I am not able to understand why the memory_order_acquire
barrier is necessary before the delete x
operation. Specifically, how is it safe for the compiler/processor to reorder the memory operations of delete x
before the fetch_sub
and the test on the value of x == 1
without violating the single threaded semantics?
EDIT I guess, my question wasn't very clear. Here is a rephrased version:
Will the control dependency between the read of x (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1
) and the delete x
operation provide any ordering guarantee at all? Even considering a single threaded program, is it possible for the compiler/processor to reorder the instructions corresponding to the delete x
operation before the fetch_sub
and the comparison?. It would be really helpful if the answer was as low-level as possible and included an example scenario where the delete operation gets reordered (without affecting the single threaded semantics) thus illustrating the need to preserve ordering.