@Lundin is correct that if (flag) flag = 0;
isn't an atomic RMW, and wouldn't be even with volatile
(still just separate atomic-load and atomic-store; an interrupt could happen between them.) See their answer for more about that, and that this seems like fundamentally wrong approach for some goals. Also, avoiding volatile
only makes sense if you're replacing it with _Atomic
; that's what Herb Sutter's 2009 article you linked was saying. Not that you should use plain variables and force memory access via barriers; that is fraught with peril, as compilers can invent loads or invent stores to non-atomic variables, and other less-obvious pitfalls. If you're going to roll your own atomics with inline asm, you need volatile
; GCC and clang support that usage of volatile
since it's what the Linux kernel does, as well as pre-C11 code.
Optimizing away g
The barrier isn't what blocks GCC from optimizing away g
entirely. GCC misses this valid optimization when there's a function as simple as void func(){g++;}
with no other code in the file, except the declarations.
But g
is be optimized away even with asm("" ::: "memory")
if the C code that uses it won't produce a chain of different values across calls. Storing a constant is fine, and storing a constant after an increment is enough to make the increment a dead store. Perhaps GCC's heuristic for optimizing it away only considers a chain of a couple calls, and doesn't try to prove that nothing value-dependent happens?
#include <stdint.h>
//uint8_t flag;
static uint8_t g;
void func(void) {
__asm__ __volatile__ ("" ::: "memory"); // <1>
int tmp = g;
g = tmp;
++g; g = 1;
// g = tmp+1; // without a later constant store to make it dead will make GCC miss the optimization
__asm__ __volatile__ ("" ::: "memory"); // <1>
}
GCC12 -O3 output for AVR, on Godbolt:
func:
ret
The "memory"
clobber forces the compiler to assume that g
's value could have changed, if it doesn't optimize it away. But it doesn't make all static
variables in the compilation implicit inputs/outputs. The asm
statement is implicitly volatile because it has no output operands.
Telling the compiler that only flag
was read+written should be equivalent. Except if g
doesn't get optimized away, GCC can hoist the load of g
out of the loop, only storing incremented values. (It misses the optimization of sinking the store out of the loop. That would be legal; the "+m"(flag)
operand tells the compiler that flag
was read + written so could have any value now, but without a "memory"
clobber, the compiler can assume the asm
statement didn't read or write any other state of the C abstract machine from registers or memory.
Your statement with an "=m" (flag)
output-only operand is different: it tells the compiler that the old value of flag
was irrelevant, not an input. So if it was unrolling the loop, any stores to flag
before one of those asm
statements would be a dead store.
(The asm
statement is volatile so it does have to run it as many times as its reached in the abstract machine; it has to assume there might be side-effects like I/O to something that isn't a C variable. So the previous asm statements can't be removed as dead because of that, but only because of volatile
.)