0

It is known that asm volatile ("" ::: "memory") can serve as a compiler barrier to prevent compiler from reordering assembly instructions across it. For example, it is mentioned in https://preshing.com/20120625/memory-ordering-at-compile-time/, section "Explicit Compiler Barriers".

However, all the articles I can find only mention the fact that asm volatile ("" ::: "memory") can serve as a compiler barrier without giving a reason why the "memory" clobber can effectively form a compiler barrier. The GCC online documentation only says that all the special clobber "memory" does is tell the compiler that the assembly code may potentially perform memory reads or writes other than those specified in operands lists. But how does such a semantic cause compiler to stop any attempt to reorder memory instructions across it? I tried to answer myself but failed, so I ask here: why can asm volatile ("" ::: "memory") serve as a compiler barrier, based on the semantics of "memory" clobber? Please note that I am asking about "compiler barrier" (in effect at compile-time), not stronger "memory barrier" (in effect at run-time). For convenience, I excerpt the semantics of "memory" clobber in GCC online doc below:

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
zzzhhh
  • 319
  • 1
  • 8
  • 1
    You cannot reorder an instruction if its effect is unknown. – n. m. could be an AI Jun 11 '21 at 21:23
  • IMO it tells the compiler to completer all memory (and consequently other operations as well) before the next statements. like in this example: https://godbolt.org/z/fKv6GaGET – 0___________ Jun 11 '21 at 23:20
  • 1
    I think you asked a very similar question a day or two ago, and as with it, I don't see why the quoted passage doesn't already answer the question. Can you give a specific example of a possible reordering, for which you are not sure whether or why the quoted text forbids it? Say, a snippet of C source together with assembly (pseudo)code that you are unsure whether the compiler could emit. Then someone can probably explain more concretely. – Nate Eldredge Jun 11 '21 at 23:31
  • @Nate Eldredge: You don't see it because you are an expert, and you think the two questions are very similar for the same reason. But I did not see the similarity of reads-everything / writes-everything effect to the effect of compiler barrier until just now -- they are both "the desired effect of requiring memory to be in sync." It is the phrase "in sync" that remind me of the analogy in https://preshing.com/20120710/memory-barriers-are-like-source-control-operations and then all at once I understand the similarity you referred to in the comment. – zzzhhh Jun 12 '21 at 22:26

2 Answers2

2

If a variable is potentially read or written, it matters what order that happens in. The point of a "memory" clobber is to make sure the reads and/or writes in an asm statement happen at the right point in the program's execution.

(Or more specifically, in this thread's execution, since a compiler barrier is like atomic_signal_fence not atomic_thread_fence. Except on ISAs like x86 where acquire or release thread fences only require blocking compile-time reordering to take advantage of the hardware's strong run-time ordering. e.g. asm("":::"memory") is a possible implementation of atomic_thread_fence(memory_order_release) on x86, but not on AArch64.)


Any read of a C variable's value that happens in the source after an asm statement must be after the memory-clobbering asm statement in the compiler-generated assembly output for the target machine, otherwise it might be reading a value before the asm statement would have changed it.

Any read of a C var in the source before an asm statement similarly must stay sequenced before, otherwise it might incorrectly read a modified value.

Similar reasoning applies to assignments to (writes of) C variables before/after any asm statement with a "memory" clobber. Just like a function call to an "opaque" function, one who's definition the compiler can't see.

No reads or writes can reorder (at compile time) with the barrier in either direction, therefore no operation before the barrier can reorder with any operation after the barrier, or vice versa.


Another way to look at it: the actual machine memory contents must match the C abstract machine at that point. The compiler-generated asm has to respect that, by storing any variable values from registers to memory before the start of an asm("":::"memory") statement, and afterwards it has to assume that any registers that had copies of variable values might not be up to date anymore. So they have to be reloaded if they're needed.

This reads-everything / writes-everything assumption for the "memory" clobber is what keeps the asm statement from reordering at all at compile time wrt. all accesses, even non-volatile ones. The volatile is already implicit from being an asm() statement with no "=..." output operands, and is what stops it from being optimized away entirely (and with it the memory clobber).


Note that only potentially "reachable" C variables are affected. For example, escape analysis can still let the compiler keep a local int i in a register across a "memory" clobber, as long as the asm statement itself doesn't have the address as an input.

Just like a function call: for (int i=0;i<10;i++) {foobar("%d\n", i);} can keep the loop counter in a register, and just copy it to the 2nd arg-passing register for foobar every iteration. There's no way foobar can have a reference to i because its address hasn't been stored anywhere or passed anywhere.

(This is fine for the memory barrier use-case; no other thread could have its address either.)


Related:

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    I understand all at once when reading "Just like a function call to an "opaque" function". https://preshing.com/20120625/memory-ordering-at-compile-time/ gives the explanation I want, but for functions. Since it is in section "Implied Compiler Barriers", I failed to apply the same reasoning to `asm("":::"memory")` in section "Explicit Compiler Barriers". Now I think they are both implicit. Thank you for the detailed answer. PS, it's cool to point out `volatile` is implicit so we don't need to write it. – zzzhhh Jun 12 '21 at 02:05
  • 1
    PS, an example related to "reachable" can be found here: https://gcc.gnu.org/legacy-ml/gcc-help/2019-09/msg00016.html – zzzhhh Jun 12 '21 at 21:26
0

I'll add that : memory is only a compiler directive. A speculative processor may reorder instructions. To prevent this an explicit memory barrier call is necessary. See Linux doc on memory barriers.

dturvene
  • 2,284
  • 1
  • 20
  • 18
  • 1
    I edited my answer to phrase in a way that reminds readers of this fact. The question does specifically say it's asking about compiler barriers, not run-time barriers, but it's not a bad idea to clarify in the answer as well in case people don't read the whole question. Anyway, my edit basically makes this separate answer redundant, but this answer didn't answer the question so should probably just have been a comment in the first place. – Peter Cordes Sep 16 '22 at 19:52
  • I agree my response is redundant and should have been a comment. However, some of the responses discuss fences and speculation so I (quickly) added this for visibility; but it should have been a comment... – dturvene Sep 16 '22 at 23:56
  • I don't see any comments on this question or other responses here that discuss runtime reordering. And my answer was (previously) purely about compile-time ordering. If you mean on other Q&As, you could have responded there. – Peter Cordes Sep 17 '22 at 00:10