100

When implementing lock-free data structures and timing code it's often necessary to suppress the compiler's optimisations. Normally people do this using asm volatile with memory in the clobber list, but you sometimes see just asm volatile or just a plain asm clobbering memory.

What impact do these different statements have on code generation (particularly in GCC, as it's unlikely to be portable)?

Just for reference, these are the interesting variations:

asm ("");   // presumably this has no effect on code generation
asm volatile ("");
asm ("" ::: "memory");
asm volatile ("" ::: "memory");
jleahy
  • 16,149
  • 6
  • 47
  • 66
  • 2
    Someone seems to be messing around far too close to the metal :-) (And somewhere else, @Mysticial is typing away at a ridiculously detailed answer...) – Kerrek SB Jan 21 '13 at 23:39

3 Answers3

84

See the "Extended Asm" page in the GCC documentation.

You can prevent an asm instruction from being deleted by writing the keyword volatile after the asm. [...] The volatile keyword indicates that the instruction has important side-effects. GCC will not delete a volatile asm if it is reachable.

and

An asm instruction without any output operands will be treated identically to a volatile asm instruction.

None of your examples have output operands specified, so the asm and asm volatile forms behave identically: they create a point in the code which may not be deleted (unless it is proved to be unreachable).

This is not quite the same as doing nothing. See this question for an example of a dummy asm which changes code generation - in that example, code that goes round a loop 1000 times gets vectorised into code which calculates 16 iterations of the loop at once; but the presence of an asm inside the loop inhibits the optimisation (the asm must be reached 1000 times).

The "memory" clobber makes GCC assume that any memory may be arbitrarily read or written by the asm block, so will prevent the compiler from reordering loads or stores across it:

This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.

(That does not prevent a CPU from reordering loads and stores with respect to another CPU, though; you need real memory barrier instructions for that.)

Community
  • 1
  • 1
Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
  • This is actually very interesting, not realizing that gcc treats `asm` blocks without outputs as volatile was a huge gap in my knowledge. – jleahy Jan 24 '13 at 09:59
  • So `volatile` = performance-killer no matter what context it's used in (variable or asm). File it with the `goto` keyword - use only when absolutely necessary. – etherice May 25 '13 at 21:51
  • "any memory" means any object in memory? – curiousguy Jun 21 '18 at 04:47
  • @etherice: How is it a performance killer for `asm`? For `asm`, all it means is "don't delete this instruction" and "don't move this instruction out of a loop, even if it appears to be idempotent"(it doesn't otherwise prevent instruction reordering around it; to prevent reordering, you'd have to explicitly declare artificial dependencies on the surrounding instructions). If you're hand-writing inline assembly, it probably shouldn't be deleted, and it's not causing unnecessary memory reads/writes/fences or restricting instruction reordering like `volatile` does for variables. – ShadowRanger Jul 23 '19 at 17:43
  • 4
    A `"memory"` clobber only applies to globally-reachable memory, or memory reachable via any pointer inputs to the `asm` statement. As far as which C objects have to be "in sync" in memory and which can still be in registers, it's like a non-inline function call. So local vars that have never had their address passed outside the function (e.g. loop counters) can typically still stay in registers thanks to [escape analysis](https://en.wikipedia.org/wiki/Escape_analysis). – Peter Cordes Jul 23 '19 at 18:04
  • 1
    This optimization is safe because it's already not safe/allowed to do something like `asm("incl -16(%%rbp)" ::: "memory")` to access the stack space where gcc happens to put a local (without using a `"+m"` operand to get the compiler to generate an addressing mode). Stack-frame layout isn't something you can make any assumptions about; different compiler options will change it. So anyway, a `"memory"` clobber does what this answer says, but with a performance penalty that's not *quite* as bad. – Peter Cordes Jul 23 '19 at 18:10
  • @PeterCordes so does volatile imply "memory"? I.e. it marks the asm as having some side effects, some of which might depend on memory, so wouldn't it prevent the same optimizations "memory" clobber does plus some more? – Dan M. Jul 21 '23 at 13:41
  • @DanM.: No, `asm volatile` doesn't imply a `"memory"` clobber. Use cases for `volatile` without `"memory"` include stuff like `asm volatile("rdtsc" : "=a"(low), "=d"(high))`. If your side-effects include reading or writing memory, then tell the compiler about it with memory output and/or input operands, or a `"memory"` clobber. GNU C inline asm syntax is designed for performance, not to be easy or simple. An `asm` statement without any output operands is implicitly `volatile`, though, which is why `asm("" ::: "memory")` works. Think of `volatile` as "not a pure function of the inputs". – Peter Cordes Jul 21 '23 at 18:48
  • See also [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) re: dummy memory input or output operands instead of a `"memory"` clobber, if you know which object(s) can be read or written. – Peter Cordes Jul 21 '23 at 18:49
  • @PeterCordes can compiler reorder instructions with asm volatile? Will it move stores/loads past volatiel asm? – Dan M. Jul 22 '23 at 01:21
  • @DanM.: Yes, if the stores are non-`volatile` and there's no `"memory"` clobber (or if escape analysis can prove that the variables are locals that nothing else can have a reference to, including the asm statement). [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) which I linked in my last comment has examples of that happening, allowing dead store elimination or CSE of loads. That's why I linked it. – Peter Cordes Jul 22 '23 at 02:48
13

asm ("") does nothing (or at least, it's not supposed to do anything.

asm volatile ("") also does nothing.

asm ("" ::: "memory") is a simple compiler fence.

asm volatile ("" ::: "memory") AFAIK is the same as the previous. The volatile keyword tells the compiler that it's not allowed to move this assembly block. For example, it may be hoisted out of a loop if the compiler decides that the input values are the same in every invocation. I'm not really sure under what conditions the compiler will decide that it understands enough about the assembly to try to optimize its placement, but the volatile keyword suppresses that entirely. That said, I would be very surprised if the compiler attempted to move an asm statement that had no declared inputs or outputs.

Incidentally, volatile also prevents the compiler from deleting the expression if it decides that the output values are unused. This can only happen if there are output values though, so it doesn't apply to asm ("" ::: "memory").

Lily Ballard
  • 182,031
  • 33
  • 381
  • 347
  • 12
    Matthew Slattery's answer points out that `asm volatile ("")` is not quite the same as doing nothing, as it can have drastic effects on compiler optimization. The same performance implications would apply to using `asm volatile ("" ::: "memory")` as a compiler fence. – etherice May 25 '13 at 21:33
  • The compiler does not understand assembly language! – curiousguy Oct 06 '15 at 00:26
  • 3
    @curiousguy no, but it does understand when an `asm` block has declared inputs/outputs, which tells the compiler which registers it depends on and which ones it will modify, and therefore the compiler can shuffle certain computations around if they don't affect the inputs/outputs. – Lily Ballard Oct 07 '15 at 00:58
  • 1
    At least for GCC, `asm volatile` does *not* inhibit general instruction reordering, it only prevents a reachable `asm` block from being deleted due to (apparent) lack of meaningful side-effects (and on more recent GCC, prevents it from being hoisted out of a loop even if the compiler determines the inputs are always the same). Otherwise, instruction reordering is only inhibited by declared inputs and outputs (and pseudo-outputs, like `"memory"`). Read more in [the docs](https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Extended-Asm.html#Volatile). – ShadowRanger Jul 23 '19 at 17:50
  • Yes, `asm` statements are implicitly `volatile` if they have no output constraints. (https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html). So `asm("":::"memory")` is *exactly* identical to `asm volatile("":::"memory")`. A non-volatile asm statement could be deleted if the result is never used, or hoisted (or otherwise [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination)d) if it's run with the same input(s) repeatedly. Thus you'd need explicit `volatile` to wrap something like `rdtsc` or `rdrand` because you *do* get a different output with the same (empty) set of inputs. – Peter Cordes Dec 27 '20 at 03:19
  • But as ShadowRanger says: `volatile` doesn't nail it down wrt. all other instructions, or even to private local var accesses (that escape analysis has proved couldn't be accessed through pointers from global vars). Think of it like a non-inline function call: mem has to be in sync, but only mem that could possibly be globally reachable, or passed as a pointer arg. For an asm statement with no inputs, only the globally reachable part applies. – Peter Cordes Dec 27 '20 at 03:25
3

Just for completeness on Lily Ballard's answer, Visual Studio 2010 offers _ReadBarrier(), _WriteBarrier() and _ReadWriteBarrier() to do the same (VS2010 doesn't allow inline assembly for 64-bit apps).

These don't generate any instructions but affect the behaviour of the compiler. A nice example is here.

MemoryBarrier() generates lock or DWORD PTR [rsp], 0

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
James
  • 9,064
  • 3
  • 31
  • 49