3

The following code contains an interrupt-service-routine and a normal function func(), which uses the global flag and additionally the static global g. Without any memory-barrier this code is faulty, since flag is modified asynchronously.

Introducing the global memory-barrier <1> fixes that, but also inhibits the optimizations on g. My expectation was that all access to g would be optimized out, since g is not accessible outside this TU.

I know that a global memory barried has the same effect as calling a non-inline function f() <3>. But here is the same question: since g is not visible outside this TU, why should not optimize the access to g.

I tried to use a specific memory-barrier against flag but that did not help either.

(I avoided qualifying flag as volatile: this would help here, but it should only be used accessing HW-registers).

The question now is how to get the accesses to g optimized?

Compiler: avr-gcc

https://godbolt.org/z/ob6YoKx5a

#include <stdint.h>

uint8_t flag;  

void isr() __asm__("__vector_5") __attribute__ ((__signal__, __used__, __externally_visible__)); 
void isr() {
    flag = 1;
}

static uint8_t g;

void f();

void func(void) {
  for (uint8_t i=0; i<20; i++) {
//      f(); // <3>
//      __asm__ __volatile__ ("" : : : "memory"); // <1>
//          __asm__ __volatile__ ("" : "=m" (flag)); // <2>
    ++g;
    if (flag) {
      flag = 0;
    }
  }
}

//void f(){}
wimalopaan
  • 4,838
  • 1
  • 21
  • 39
  • 1
    "memory" tells the compiler your ASM code can edit anything in memory. – user253751 Feb 07 '23 at 09:51
  • And why does the specific memory-barrier not work? – wimalopaan Feb 07 '23 at 09:53
  • probably because you told the compiler your ASM code could edit anything in memory – user253751 Feb 07 '23 at 09:57
  • No, the specific memory barrier <2> only refers to `flag`. – wimalopaan Feb 07 '23 at 10:55
  • *I avoided qualifying `flag` as `volatile`: this would help here, but it should only be used accessing HW-registers* [There are other situations](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.3p5): "When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects that are neither lock-free atomic objects nor of type volatile sig_atomic_t are unspecified ..." In fact, that seems rather apropos here. – Andrew Henle Feb 07 '23 at 11:05
  • @wimalopaan where did you read that? – user253751 Feb 07 '23 at 11:07
  • https://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2 – wimalopaan Feb 07 '23 at 11:22
  • In <2>, you are clobbering **all** of memory due to `"memory"` – emacs drives me nuts Feb 07 '23 at 16:03
  • Sorry, that was wrong. I corrected the example. But that did not change anything!? – wimalopaan Feb 07 '23 at 16:10
  • 1
    If you want to do atomic operations on an AVR you'll need to use [``](http://atmel-studio-doc.s3-website-us-east-1.amazonaws.com/webhelp/GUID-90493D45-DB42-4BB8-9627-BAA0225A71E3-en-US-1/index.html?GUID-8E7AC268-08E8-4F83-B48E-D1E6A7A50BC4) as the standard C and C++ headers are not available as best I can tell. – Mgetz Feb 07 '23 at 16:27
  • "(volatile) _should only be used accessing HW-registers_" No! Use volatile whenever you want the compiler to treat a variable as having unknown value, even where code logic implies it must have a specific value; IOW, With **volatile you can use `ptrace` to set a variable to an arbitrary value**. (The "ptrace semantic" of volatile is the one and only definition that fits what all compilers are doing and that's actually useful.) Without volatile, you can break compiler optimizations and get UB when changing variables with ptrace; IOW, volatile prevents all optimizations involving object state. – curiousguy Feb 25 '23 at 21:26

3 Answers3

2

Lots of misconceptions.

  • There is nothing called "static global", that's like saying "dog cat", they are each other's opposites. You can have variables declared at local scope or at file scope. You can have variables with internal linkage or external linkage.

    A "global" - which isn't a formal term, is a variable declared at file scope with external linkage which may be referred to by other parts of the program using extern. This is almost always bad practice and bad design.

    static ensures that a variable no matter where it was declared has internal linkage. So it is by definition not "global". Variables declared static are only accessible inside the scope where they were declared. For details check out What does the static keyword do in C?

  • Memory barriers is not a concept that makes much sense outside multicore systems. The purpose of memory barriers are to prevent concurrent execution/pipelining, pre-fetching or instruction reordering in multicore systems. Also, memory barriers do not guarantee re-entrancy on the software level.

    This is some single core AVR, one of the simplest CPUs still manufactured, so memory barriers is not an applicable concept. Do not read articles about PC programming on 64 bit x86 and try to apply them to 8-bit legacy architectures from the 1990s. Wrong tool, wrong purpose, wrong system.

  • volatile does not necessarily act as a memory barrier even on systems where that concept is applicable. See for example https://stackoverflow.com/a/58697222/584518.

    volatile does not make code re-entrant/thread-safe/interrupt-safe on any system, including AVR.

    The purpose of volatile in this context is to protect against incorrect compiler optimizations when the compiler doesn't realize that an ISR is called by hardware and not by the program. We can't volatile qualify functions or code in C, only objects, hence variables shared with an ISR need to be volatile qualified. Details and examples here: https://electronics.stackexchange.com/questions/409545/using-volatile-in-embedded-c-development/409570#409570

As for what you should be doing instead, I believe my answer to your other question here covers that.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Your "memory barriers don't make sense outside multicore systems" is wrong when memory is shared by a CPU and a device. Note that the question mentions "interrupt service routine" which may imply it's some kind of a device driver for a device that generates IRQs. – Brendan Feb 19 '23 at 02:00
1

__asm__ __volatile__ ("" :: "m" (flag):"memory"); // <2>

You are clobbering all of memory due to "memory".

If you want to express that only flag is changed (and that the change has volatile effect(s)), then:

__asm__ __volatile__ ("" : "+m" (flag));

This tells GCC that flag is changed, not just an input to the asm like in <2>.

emacs drives me nuts
  • 2,785
  • 13
  • 23
  • That has not the desired effect. – wimalopaan Feb 07 '23 at 16:11
  • And what is your desire? I don't know that. Maybe you can be more specific. – emacs drives me nuts Feb 07 '23 at 16:13
  • Without the memory clobber for `flag` the variable `g` is incremented by 20 in on step and the whole loop vanishes. In my expectation this should also happen if only flag is clobbered as it would be, if `flag` is `volatile` qualified. – wimalopaan Feb 07 '23 at 16:19
  • Maybe this kind of optimization just is not implemented (if we use types other than `char` so that there are no aliasing issues, it's still not optimized). llvm doesn't optimize it either. For a complete answer about why this optimitatoin is not performed, maybe better go to a GCC mailing list like `gcc-help@gcc.gnu.org`. Maybe it was just overlooked, has some issues, or nobody had resources to implement it. – emacs drives me nuts Feb 07 '23 at 16:32
  • @wimalopaan: https://godbolt.org/z/c4f74hY99 shows that with `__asm__ __volatile__ ("" : "+m" (flag));`, GCC loads `g` once, but for some reason stores it every iteration inside the loop. only hoisting the load, not sinking the store. Same with your broken `"=m"(flag)` which tells the compiler the old value of `flag` doesn't matter, that it's an output-only operand, so it's only luck that it compiles "correctly". The `asm("":::"memory")` compiler barrier forces it to reload `g` every iteration as well. – Peter Cordes Feb 07 '23 at 16:34
  • But don't use barriers instead of `volatile` accesses; compilers can invent loads/stores and other surprising things: https://lwn.net/Articles/793253/ Anyway, I only see the loop being optimized away to a single `g+=20` if I remove any `asm volatile` statement from the loop, as expected: if an `asm` statement has to run (even if empty), the compiler has to loop or fully unroll it. But with any `m` operand, it can't fold the loop iterations into a no-op on `flag`, so unrolling would actually take code which it chooses not to do. – Peter Cordes Feb 07 '23 at 16:35
  • @PeterCordes: so, this might be an avr-gcc issue? – wimalopaan Feb 07 '23 at 16:46
  • I read https://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2. And therefore I tried to avoid `volatile` for normal objects. And additionally the sig_atomic_t type is not available in avr-gcc. So, there seems to be no way to get a conforming solution. – wimalopaan Feb 07 '23 at 16:49
  • @PeterCordes: unrolling is done with less than 16 iterations. – wimalopaan Feb 07 '23 at 16:57
  • @wimalopaan: What do you mean "avr-gcc issue"? On Godbolt, it's working as expected with GCC for AVR. I could imagine it being different with your `"=m"` (output-only) instead of the correct `"+m"` (tell the compiler the var is read and written), but what compiler version and options gave you what asm? – Peter Cordes Feb 07 '23 at 17:03
  • @wimalopaan: If you want to roll your own "atomic" loads and stores instead of using `_Atomic uint8_t`, `volatile` *is* the right tool for the job in GNU C. That's how the Linux kernel does it. It's not portable, that's why C11 introduced `_Atomic`, like Herb Sutter says you should use instead in the Dr. Dobbs article you linked. `asm()` compiler barriers are even less portable, and do a less good job for the purpose you're using them for. (But yeah, declaring the *variable* `volatile` is weird; `volatile` is a property of an access, not a variable, as Linux Torvalds has pointed out.) – Peter Cordes Feb 07 '23 at 17:06
  • @PeterCordes: I mean, that the compiler does not sink the store to g out of the loop. – wimalopaan Feb 07 '23 at 17:09
  • @wimalopaan: Oh, yeah like I said, GCC would be allowed to sink the store or even optimize away `g` given the semantics of the inline `asm` in this answer, and I wouldn't be surprised if other compilers do. Missed-optimization bugs are not rare. It *could* let `g` be optimized away, though, but even better to remove the `asm` stuff and just use `volatile` on `flag` if you want that effect. That does let GCC sink the store at least. – Peter Cordes Feb 07 '23 at 17:15
  • @PeterCordes: is it correct that GCC could optimize out the operations on `g` completely (since it has not external linkage)? – wimalopaan Feb 07 '23 at 17:18
  • @wimalopaan: Yes, like I said, with `volatile` or the `asm` suggested in this answer, it's *allowed* to. It may be allowed to optimize it away even with a `"memory"` clobber, as long as `g` isn't an operand to the `asm` statement. But in practice GCC and clang don't optimize away a `static uint8_t g;` that's incremented even with nothing else going on in the compilation unit, and no memory barriers or loops: https://godbolt.org/z/7h5G5Kfsv IDK what's up with that; I think they do optimize away unused static variables sometimes. Maybe not much code does that much with unneeded vars? – Peter Cordes Feb 07 '23 at 18:01
1

@Lundin is correct that if (flag) flag = 0; isn't an atomic RMW, and wouldn't be even with volatile (still just separate atomic-load and atomic-store; an interrupt could happen between them.) See their answer for more about that, and that this seems like fundamentally wrong approach for some goals. Also, avoiding volatile only makes sense if you're replacing it with _Atomic; that's what Herb Sutter's 2009 article you linked was saying. Not that you should use plain variables and force memory access via barriers; that is fraught with peril, as compilers can invent loads or invent stores to non-atomic variables, and other less-obvious pitfalls. If you're going to roll your own atomics with inline asm, you need volatile; GCC and clang support that usage of volatile since it's what the Linux kernel does, as well as pre-C11 code.


Optimizing away g

The barrier isn't what blocks GCC from optimizing away g entirely. GCC misses this valid optimization when there's a function as simple as void func(){g++;} with no other code in the file, except the declarations.

But g is be optimized away even with asm("" ::: "memory") if the C code that uses it won't produce a chain of different values across calls. Storing a constant is fine, and storing a constant after an increment is enough to make the increment a dead store. Perhaps GCC's heuristic for optimizing it away only considers a chain of a couple calls, and doesn't try to prove that nothing value-dependent happens?

#include <stdint.h>

//uint8_t flag;
static uint8_t g;

void func(void) {
  __asm__ __volatile__ ("" ::: "memory"); // <1>
    
    int tmp = g;
    g = tmp;
    ++g; g = 1;
    // g = tmp+1;  // without a later constant store to make it dead will make GCC miss the optimization

  __asm__ __volatile__ ("" ::: "memory");  // <1>

}

GCC12 -O3 output for AVR, on Godbolt:

func:
        ret

The "memory" clobber forces the compiler to assume that g's value could have changed, if it doesn't optimize it away. But it doesn't make all static variables in the compilation implicit inputs/outputs. The asm statement is implicitly volatile because it has no output operands.

Telling the compiler that only flag was read+written should be equivalent. Except if g doesn't get optimized away, GCC can hoist the load of g out of the loop, only storing incremented values. (It misses the optimization of sinking the store out of the loop. That would be legal; the "+m"(flag) operand tells the compiler that flag was read + written so could have any value now, but without a "memory" clobber, the compiler can assume the asm statement didn't read or write any other state of the C abstract machine from registers or memory.

Your statement with an "=m" (flag) output-only operand is different: it tells the compiler that the old value of flag was irrelevant, not an input. So if it was unrolling the loop, any stores to flag before one of those asm statements would be a dead store.

(The asm statement is volatile so it does have to run it as many times as its reached in the abstract machine; it has to assume there might be side-effects like I/O to something that isn't a C variable. So the previous asm statements can't be removed as dead because of that, but only because of volatile.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Here is a slightly modified example: https://godbolt.org/z/bdnf67E8M I do not understand, why the GCC optimizes out all accesses to `g`, which is clearly false to my eyes, because `g` is also accessed by the `ISR()`. – wimalopaan Feb 08 '23 at 07:40
  • @wimalopaan: ISR() doesn't access it, `// g = 1;` is commented out. Uncommenting it, it's still a write-only access, so no behaviour of any function actually depends on the value read from `g` at any time. And it's not `volatile`, so the compiler can assume it's not modified asynchronously. (ISO C says that would be data-race UB even *with* `volatile`, but GNU C defines the behaviour.) – Peter Cordes Feb 08 '23 at 08:12
  • 1
    Ok, I think I got it: first of all the code is optimized in the abstract machine relying on a sequential flow and here `g` is optimzed-out (in `func()`). So, no global memory-barrier can suppress this (other the memory-barrier is between `++g;` and `--g;`) – wimalopaan Feb 08 '23 at 08:21
  • Where does GCC docs say that data-races are well-defined? – wimalopaan Feb 08 '23 at 09:42
  • @wimalopaan: Only data races on `volatile` objects. The way GCC (and clang and ICC) implement `volatile` makes it usable for rolling your own atomics. The Linux kernel relies on simultaneous write+read or write+write `volatile` accesses to the same object to work as expected on the architectures it supports, because they have coherent cache. That is in fact the case for GCC (and clang and ICC, also many compilers that don't support GNU extensions.) Other projects used to rely on that before C++11, and some legacy code still does use volatile. Despite it being data-race UB in ISO C and C++. – Peter Cordes Feb 08 '23 at 16:49
  • @wimalopaan: I don't know where in the GCC docs it says anything explicit about it, but there is strong de-facto for `volatile` as `mo_relaxed` `_Atomic` for pure-load / pure-store of types narrow enough to naturally be atomic. For example [Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes](//stackoverflow.com/q/71866535) shows GCC for AArch64 compiling `x = 0xdeadbeefdeadbeef` to `stp` or 2 `str` of the 32-bit half (the `stp` version is atomic on later ARMv8). But with volatile `x`, one `str`, atomic on ARMv8.0 – Peter Cordes Feb 08 '23 at 16:52
  • "fraught with peril," isn't a valid link – curiousguy Feb 19 '23 at 00:16
  • @curiousguy: Thanks, apparently I copy/pasted the title, not the URL. Fixed now. – Peter Cordes Feb 19 '23 at 01:24