Is memory_order_acquire on reference counted pointers to trivially destructible types superfluous?

Question

This question is specifically about trivially destructible types within reference counted pointers. See the example from Boost's documentation on uses of atomics.

The decrement is as follows:

if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
  // A
  boost::atomic_thread_fence(boost::memory_order_acquire);
  delete x;
}

We know that, due to memory_order_release, all reads/writes of x are completed before the fetch_sub (see here). Thus, if we happen to reach point A then all uses of x are complete.
At point A in the code, we are not guaranteed by the standard to see the latest value of x until after the memory_order_acquire fence...

So here is my question regarding the second statement about memory_order_acquire:

When x points to a trivially destructible type (for example, int where x is int * const) is the memory_order_acquire pointless? The rationale I have is because if x is trivially destructible then the latest changes to x does no affect the deletion of x?

For example, whether the deleting thread's delete x; sees the latest x such that *x = 10 or an outdated value such that *x = 8 the destruction process is always the same regardless (as long as the pointer x itself remains constant). It knows that no one is going to modify the x from that point thanks to the release and so all it has to do is deallocate.

Is there another benefit of memory_order_acquire that I am missing here? Is my thinking correct and if not then why do we need to see the latest value of x on the deleting thread?

Not that I know the answer but _For example, whether the deleting thread's delete x; sees the latest x such that *x = 10 or an outdated value such that *x = 8 the destruction process is always the same regardless_ Well wouldn't this mean that you would allow the `*x = 10` might happen after `delete x` (due to possible re-ordering of this code)? That sounds scaring. — Scheff's Cat, Nov 09 '19 at 07:32
The acquire is necessary but it has nothing to do with a trivially destructible type. Without acquire, the `delete` could be reordered before the `fetch_sub` and invoke object destruction while still being accessed by other threads. Here is [my attempt](https://stackoverflow.com/questions/48124031/stdmemory-order-relaxed-atomicity-with-respect-to-the-same-atomic-variable/48148318#48148318) to explain this from a `shared_ptr` point of view. — LWimsey, Nov 09 '19 at 22:49
@LWimsey What do you mean by reordered? As in source to source transformation? — curiousguy, Nov 10 '19 at 10:09
Memory ordering is a broad topic.. a good start is [Jeff Preshing's blog](https://preshing.com/20120913/acquire-and-release-semantics/) — LWimsey, Nov 10 '19 at 10:22
@LWimsey Can C++ MT semantic be defined in term of reordered code? — curiousguy, Nov 10 '19 at 10:23
_Can C++ MT semantic be defined in term of reordered code?_ - Technically not, or at least not by hard core language lawyers who think in terms of 'synchronize-with'. — LWimsey, Nov 10 '19 at 11:02
@curiousguy _How can that code be "reordered"_ - #LoadStore reordering can happen on weaker CPU's. [This Q/A](https://stackoverflow.com/questions/52215031/how-is-load-store-reordering-possible-with-in-order-commit) has more details — LWimsey, Nov 10 '19 at 11:11
"... whether the deleting thread's `delete x;` sees the latest `x` such that `*x = 10` or an outdated value such that `*x = 8` the destruction process is always the same regardless" - But **reusing** of de-allocated memory in new allocation definitely depends on absence of "delayed" writes into this memory. If `*y = 12` is performed after new allocation, this operation should come **after** any assignment to the de-allocated `x`. Also, some allocators may **poison** the memory on deletion. This poisoning should also be ordered **after** any assignment to de-allocated memory. — Tsyvarev, Nov 10 '19 at 21:35
@Tsyvarev Exactly. That an hypothetical dtor might or might not see the correct value of the fields is completely beside the point. Another way to view it is: deallocation is actually the starting point of the process of re-using the memory in the current thread (or another one possibly) and that re-use needs to come after the use by another thread. — curiousguy, Dec 21 '19 at 06:54

score 1 · Answer 1 · answered Feb 07 '22 at 14:59

The standard is not written in terms of which interleavings of operations in racing threads might happen. I gather such a specification would be too strict: compilers need to reorder loads and stores, even across sync points, for speed.

Instead, the standard simply says what a data race is, and that they're undefined behavior.

Informally, a data race occurs when:

one thread accesses memory;
another thread writes to the same memory location; and
there's no synchronization between the two threads to impose an order on the two accesses.

Deleting an object counts as a write, even if the destructor is trivial. If two threads access *x and then decref it using your code, and one thread deletes x, we can see that all three requirements are fulfilled. There is a data race on *x.

The data race isn't on x->refcount_, since both threads access it with an atomic operation (an explicit exception, in the standard, to the sloppy definition of "data race" I gave above). But because the memory ordering is release on both threads, it doesn't synchronize the two threads.

People often try to imagine what compiler shenanigans might lead to actual misbehavior in practice, to see if the race might be considered "benign", but I've given up on this. According to the standard, the lack of a release-acquire handoff makes the behavior undefined.

score 0 · Accepted Answer · answered Dec 20 '19 at 20:55

No way!

You seem to believe that barriers are a tool to be able to

publish a data structure in a thread, typically with a release/store on an atomic pointer (analog of volatile reference in Java)
check that the publication occurred (load the atomic pointer and check the value) and read the data (load/acquire)

But that's just one example of the use of atomics and barriers.

In general barriers associated with relaxed atomic operation make mutual exclusion well defined. A mutex is a mutual exclusion device, a null then non null atomic pointer is another, and a reference count is yet another.

A reference count functions like a RW lock, with:

RC increment = R lock
RC decrement = R unlock
observing (RC = 0) after a decrement = W lock

The (RC=0) observation is the analog of lock operation because it must have mutual exclusion with the property (RC>0). Mutual exclusion translates to a release-acquire pair, for each series of computations that need exclusion. All the users of the data controlled by the RC device need mutual exclusion with the memory release (not mutex release) operation.

score 0 · Answer 3 · answered Feb 05 '22 at 20:23

Let's consider the following example:

Initialization

int * const x = new int{42};
std::atomic<int> refcount = 2;

Thread A and Thread B

assert(*x == 42);
if (refcount.fetch_sub(1, std::memory_order_release) == 1) {
  // std::atomic_thread_fence(std::memory_order_acquire);
  delete x;
}

In that example the assert could fail or worse since it could access already destroyed and deallocated object. The problem is that there is no happens before relationship between accessing *x in one thread and its deletion in another. The assert can get reordered after the decrement of refcount in the same thread even when memory_order_release is employed.

To form this happens before relationship we need a synchronization point between threads, and release-acquire does exactly that. That is why we need an acquire fence before deleting *x. Alternatively we could use memory_order_acq_rel instead of memory_order_release when decrementing refcount and that would be enough as well.

Morally speaking, it's not that the assert may get reordered after the decrement (the release ordering would prevent that), but that the delete may get reordered before the decrement, as there is no acquire barrier to stop it. I agree that the overall effect is that there is a data race. — Nate Eldredge, Feb 05 '22 at 20:43
@NateEldredge From purely theoretical standpoint, both could get *memory reordered* without full *release-acquire* synchronization, the standard doesn't prevent it if I'm not mistaken. But I agree that in practice it's unlikely for something to get reordered past a release operation. On the other hand, reordering of `delete` before the decrement also looks unlikely because it is executed conditionally. Still we both agree that there is a data race. — dened, Feb 05 '22 at 22:44

Is memory_order_acquire on reference counted pointers to trivially destructible types superfluous?

3 Answers3