Does C++11 guarantee memory ordering between a release fence and a consume operation?

Question

Consider the following code:

struct payload
{
    std::atomic< int > value;
};

std::atomic< payload* > pointer( nullptr );

void thread_a()
{
    payload* p = new payload();
    p->value.store( 10, std::memory_order_relaxed );
    std::atomic_thread_fence( std::memory_order_release );
    pointer.store( p, std::memory_order_relaxed );
}

void thread_b()
{
    payload* p = pointer.load( std::memory_order_consume );
    if ( p )
    {
        printf( "%d\n", p->value.load( std::memory_order_relaxed ) );
    }
}

Does C++ make any guarantees about the interaction of the fence in thread a with the consume operation in thread b?

I know that in this example case I can replace the fence + atomic store with a store-release and have it work. But my question is about this particular case using the fence.

Reading the standard text I can find clauses about the interaction of a release fence with an acquire fence, and of a release fence with an acquire operation, but nothing about the interaction of a release fence and a consume operation.

Replacing the consume with an acquire would make the code standards-compliant, I think. But as far as I understand the memory ordering constraints implemented by processors, I should only really require the weaker 'consume' ordering in thread b, as the memory barrier forces all stores in thread a to be visible before the store to the pointer, and reading the payload is dependent on the read from the pointer.

Does the standard agree?

"as the memory barrier forces all stores in thread a to be visible before the store to the pointer": if speaking about x86 (or TSO in general) - this seems to be correct, but for weaker models (such as SPARC RMO) - it isn't exactly a correct description. In general (in particular, outside of TSO world) memory barriers require a counterpart memory fence in reading thread, see https://www.kernel.org/doc/Documentation/memory-barriers.txt for details. TSO can be seen as a single per-CPU write buffer and flushing it with a memory fence does make things consistent, but in general it isn't guaranteed — No-Bugs Hare, Jun 20 '15 at 08:18
@Edmund Kapusniak I was under the impression that a load tagged with `std::memory_order_consume` only gave you appropriate consume semantics if the corresponding store is tagged with either `release`, `acq_rel`, or `seq_cst`. So the `consume` load might have the same guarantees if it were instead tagged with `relaxed`, since the store to `pointer` is also `relaxed`. — Alejandro, Jun 26 '15 at 03:59
are you developing a virus? (asking because of the payload pointer XD) — CoffeDeveloper, Oct 23 '15 at 16:58
@Alejandro "_only gave you appropriate consume semantics if the corresponding store is tagged_" The principle of `std::atomic_thread_fence( std::memory_order_release )` is to generate a delayed "tag" for the previous last relaxed stores; IOW you can say that a release store is an immediate named store barrier, unlike the anonymous delayed barrier by a fence (a named barrier works on only that object, an anonymous applies to each one). — curiousguy, May 25 '19 at 14:16
@No-BugsHare "_TSO can be seen as a single per-CPU write buffer and flushing it with a memory fence does make things consistent_" A fence on the writer side on TSO? How is that possible? Fence what WRT what? How do you "flush" a buffer? — curiousguy, Dec 12 '19 at 02:03
@curiousguy Well, my description above is indeed quite frivolous (which it has to be without going into 20-page formalization), but I think my point still stands: my educated guess is that your logic does stand on TSO CPUs such as x64, but may fail on RMO CPUs such as Power or Arm. Another way to think about it is to realize that under TSO the only reordering which is allowed to happen, is "stores reordered after loads", with all the other reorderings being prohibited. — No-Bugs Hare, Dec 13 '19 at 08:38

score 3 · Answer 1 · answered Jun 20 '15 at 09:46

3

Your code works.

I know that in this example case I can replace the fence + atomic store with a store-release and have it work. But my question is about this particular case using the fence.

Fence with relaxed atomic operation is stronger than corresponded atomic operation. E.g. (from http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence, Notes):

While an atomic store-release operation prevents all preceding writes from moving past the store-release, an atomic_thread_fence with memory_order_release ordering prevents all preceding writes from moving past all subsequent stores.

answered Jun 20 '15 at 09:46

Tsyvarev

60,011
17
110
153

1

"_all subsequent stores_" on atomics, not on normal objects! This IMO isn't clear enough in the quoted text. – curiousguy May 25 '19 at 14:18
1

@curiousguy: But that's the funny thing. The memory order stuff, release/store/etc, is not for stores on atomics. It's for *all* prior memory operations and all subsequent operations. That is, you use release/acquire when you're writing some data, then using an atomic variable to let some other thread know you've written to it. The ordering ensures visibility of the write to the reader; if the reader acquires the atomic and it has the value set by the writer, then they can read the non-atomic value without incurring a data race. – Nicol Bolas May 29 '19 at 21:55
@NicolBolas "_It's for all prior memory operations_" Yes. "_and all subsequent operations_" Which operations? – curiousguy May 29 '19 at 22:35

score 0 · Answer 2 · answered May 25 '19 at 14:24

Although that's clearly the intent, the way the interaction of fences and atomic operations is specified means that only listed combinations are officially supported. (That style of specification is not only verbose, difficult to read, even more difficult to turn into a valid intuition, it's easy to make incomplete.)

I see nothing in the standard supporting pairing a consume operation with a release barrier even though it's impossible for a normal implementation to not support, except by special effort during global program optimization to detect that particular use case and deliberately break it.

Does C++11 guarantee memory ordering between a release fence and a consume operation?

2 Answers2