Mfence would be inserted by compiler to function using non atomic pointer storing value of atomic pointer

Question

I am reading cppreference of carries_dependency attribute. It seems to me that the following codes snippet from above link is telling that if carries_dependency attribute is not added to print2 function, because of passing the non-atomic pointer int* local which is storing the value of atomic pointer std::atomic<int*> p to print2, compiler would insert a mfence.

I tried to verify above on https://godbolt.org/z/TTE4bM9d6 where has the exact same number of instructions as https://godbolt.org/z/K4Wd4sEG4 that means mfence was not inserted. I can understand because of the x86-64 gcc I used. I tried to verify the same with ARM gcc trunk on https://godbolt.org/z/K6KM4nssK comparing with https://godbolt.org/z/jGYsae7dz. However, I have found got the same conclusion.

So my question is, is my understanding correct that from following snippet cppreference page is telling us mfence should be inserted by compiler if carries_dependency attribute is not added to print2 function? If so, why I can't see get that from above tests on Compiler Explorer? If mfence should be inserted in this case, is it because local is a pointer which compiler takes it as a reference? However, the local pointer and p atomic pointer are pointing at address of x which is not an atomic variable. Why compiler would still insert mfence in this case?

PS I understand I am asking more than 1 question and I am expected to only ask 1 question per post, however, all questions above are closely related to each other. The background information would be redundant if I split each question per post.

#include <atomic>
#include <iostream>
 
void print(int* val)
{
    std::cout << *val << std::endl;
}
 
void print2(int* val [[carries_dependency]])
{
    std::cout << *val << std::endl;
}
 
int main()
{
    int x{42};
    std::atomic<int*> p = &x;
    int* local = p.load(std::memory_order_consume);
 
    if (local)
    {
        // The dependency is explicit, so the compiler knows that local is
        // dereferenced, and that it must ensure that the dependency chain
        // is preserved in order to avoid a fence (on some architectures).
        std::cout << *local << std::endl;
    }
 
    if (local)
    {
        // The definition of print is opaque (assuming it is not inlined),
        // so the compiler must issue a fence in order to ensure that
        // reading *p in print returns the correct value.
        print(local);
    }
 
    if (local)
    {
        // The compiler can assume that although print2 is also opaque then
        // the dependency from the parameter to the dereferenced value is
        // preserved in the instruction stream, and no fence is necessary (on
        // some architectures). Obviously, the definition of print2 must actually
        // preserve this dependency, so the attribute will also impact the
        // generated code for print2.
        print2(local);
    }
}

@273K as mentioned above, I have tried both x86-64 gcc trunk & ARM gcc trunk compilers — cpp, Jul 27 '23 at 17:29
`consume` is temporarily deprecated until it can be reworked into something compilers can actually handle. In practice compilers treat `consume` as `acquire`, which x86 does for free. (See also [What does memory\_order\_consume really do?](https://stackoverflow.com/q/65336409) and more stuff linked from that, like https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0371r1.html). — Peter Cordes, Jul 27 '23 at 18:16
You never need `mfence` for `release`/`acquire` on x86, only for `seq_cst`. On 32-bit ARM, acquire loads will involve a `dmb ish` since there isn't a weaker barrier that's sufficient. (@273K). And if `consume` was actually implemented, then you'd see such a barrier before a call to a function that didn't promise it would respect `[[carries_dependency]]` — Peter Cordes, Jul 27 '23 at 18:17
`print2` doesn't need any barriers except on Alpha; even weakly-ordered ISAs other than Alpha do guarantee dependency-ordering from pointer to value for a load (and for ALU operations on load results so stuff like `arr[i * 3 + 4]` still has a data dependency on `i`); avoiding barriers was the whole point of `consume`. — Peter Cordes, Jul 27 '23 at 18:27
So in other words, in https://godbolt.org/z/rMa3PE55f, the `[[carries_dependency]]` means that the compiler *could* have omitted the `dmb ish` that appears at line 45 of the asm. As Peter explained, current compilers don't actually attempt to do this optimization. But without `[[carries_dependency]]` the barrier is mandatory. — Nate Eldredge, Jul 27 '23 at 20:55
(By the way, you probably want to be compiling with optimizations in your tests. The main purpose of the C++ memory model is to specify, indirectly, what optimizations are or are not allowed. If you are not optimizing to begin with, then you won't see much.) — Nate Eldredge, Jul 27 '23 at 20:57
Nothing involving `consume` (or even promoting it to `acquire`) requires flushing the store buffer. Just compiler ordering is sufficient on x86-64, no special asm instructions are ever needed for memory ordering other than `seq_cst`. (Or for RMW atomicity.) Or `sfence` if you use NT stores, but that's outside of what `std::atomic` does. — Peter Cordes, Aug 02 '23 at 07:59
Have you read the linked Q&As like [What does memory\_order\_consume really do?](https://stackoverflow.com/q/65336409) and stuff linked from it? [Memory order consume usage in C11](https://stackoverflow.com/q/55741148) explains the hardware feature that `consume` is designed to expose. [C++11: the difference between memory\_order\_relaxed and memory\_order\_consume](https://stackoverflow.com/a/59832012) links Paul McKenney's CppCon 2016 talk [C++ Atomics: The Sad Story of memory_order_consume: A Happy Ending At Last?](https://www.youtube.com/watch?v=ZrNQKpOypqU), also very helpful. — Peter Cordes, Aug 02 '23 at 08:02
If you don't understand why `acquire` doesn't need any special instructions on x86, see [C++ How is release-and-acquire achieved on x86 only using MOV?](https://stackoverflow.com/q/60314179) — Peter Cordes, Aug 02 '23 at 08:03

Mfence would be inserted by compiler to function using non atomic pointer storing value of atomic pointer

0 Answers0