Is the following "flag" variable access safe between interrupt and user code?

Question

We have inherited a project which targets Renesas RX231 microcontroller that I have been looking at.

This uC has only one instruction that locks the bus for atomicity (XCHG).

Because the processor is the only component that accesses RAM memory (no DMA or DTC being used), to manipulate variables in user code that are shared with interrupts, the interrupts are disabled (in the processor status word register) for the access time i.e.

disable_interrupts(); /* set_psw(get_psw() & ~(1 << 16)); */
/* access or modify shared variables */
enable_interrupts();  /* set_psw(get_psw() | (1 << 16)); */

However, there are also "flags" being shared without guarding, which are set in interrupts and polled in user code in the following way:

volatile unsigned char event_request_message = 0;
unsigned char condition_sending_message = 0;

#pragma interrupt
void on_request_message()
{
     ...
     event_request_message = 1; // mov.l   #0x3df5, r14
                                // mov.b   #1, [r14]
     ... 
}

void user_code()
{
     for(;;)
     {
         ...
         /* might be evaluated multiple times before transmit message is completed */
         if(event_request_message && !condition_sending_message) // mov.l   #0x3df5, r14
                                                                 // movu.b  [r14], r14
                                                                 // cmp     #0, r14
                                                                 // beq.b   0xfff8e17b <user_code+185>
                                                                 // mov.l   #0x5990, r14
                                                                 // movu.b  [r14], r14
                                                                 // cmp     #0, r14
                                                                 // bne.b   0xfff8e16f <user_code+173>
         {
              event_request_message = 0;     // mov.l   #0x3df5, r14  
                                             // mov.b   #0, [r14]                  
              condition_sending_message = 1; // mov.l   #0x5990, r14               
                                             // mov.b   #1, [r14]
              /* transmit message */ 
              ... 
         }
         ...
     }
}

My understanding of no guarding (by disabling interrupts in user code) in this case would be:

To read, set or clear a "flag" always two instructions are used, one to put the memory address in a register and one to read/set/clear
Memory addresses are always the same so can be discarded from consideration
Each read/set/clear operation then is a single instruction and therefore the access/manipulation is atomic

The question is: is my understanding correct? Is such "flag" variables access and manipulation safe in this case?
Or could there be any possible bugs/errors?

Assume the compiler used and compiler options are always the same.
Assume that the operations described are the only way such "flags" are accessed/manipulated (set to 0 or 1, read (all shown in assembly code)) (no addition, multiplication, etc.)

What if we needed to upgrade the compiler or change compiler options?
Could such simple operations result in more than "one instruction"?

The justification of using such "flags" without guarding is too limit the amount time interrupts are disabled.

Looking at the code logic, the expected behaviour is that you can request a message one or more times but you only get one answer.

PS. I tried to use the following additional tags: "cc-rx", "rxv2-instruction-set", "rx231".

Do you mean `xchg` is the only atomic RMW? Or do you really mean that a simple load or simple store might not be atomic? i.e. could be only partially executed when an interrupt handler runs. — Peter Cordes, Mar 21 '18 at 13:06
Yes, your understand is correct on those 3 points you list. Interrupts happen before or after an instruction, not in the middle. And a byte store is normally atomic anyway, even on a multi-core CPU. `event_request_message++` would give you an atomic load and a separate atomic store, but *not* an atomic increment. See [Can num++ be atomic for 'int num'?](https://stackoverflow.com/questions/39393850/can-num-be-atomic-for-int-num) for more details (including some answers that mention the uniprocessor case where being a single instruction gives you atomicity) — Peter Cordes, Mar 21 '18 at 13:33
I created 2 tags for this. I probably should have only done `renesas-rx`; we don't need separate tags for RXv1 vs. [RXv2](https://www.renesas.com/en-hq/about/press-center/news/2013/news20131112.html). Is `[renesas-rx]` a decent name? — Peter Cordes, Mar 21 '18 at 13:41
@PeterCordes Regarding `xchg` from the hardware manual: _requests for bus access from masters other than the CPU are not accepted until data transfer for the XCHG instruction is completed._ From my understanding that would be even simple load/store but the only controllers that could compete with CPU over memory bus mastership in this case would be DMA or DTC. Perhaps `xchg` it not relevant to the question, but I wanted to point out that I am aware of the issue. — Krzysztof Bracha, Mar 21 '18 at 13:58
One-way transfers like a load or a store can be ([and are on most architectures](https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-)) atomic without locking the bus, even with respect to DMA observers. So the problem here is that you said "only one instruction that *guarantees atomicity* (XCHG)", which would imply that `mov` load / store are not atomic. Assuming they are, you should have said "*only one read-modify-write instruction that guarantees atomicity (XCHG).*" or "*only one instruction that locks the bus for atomicity (XCHG).*" — Peter Cordes, Mar 21 '18 at 14:01
Are you sure RX are always single-core? I don't know anything about this family, but it would seem that Renesas safety MCU falls into this family and those are usually dual core? (And run in lock-step on safety applications) And they also have various fast, monstrous networking processors in that family too. — Lundin, Mar 26 '18 at 09:07
Regarding the question, if you disassemble and notice that the caller uses 2 instructions to read, then the code is not safe, simple as that. You need to either use some manner of atomic access or to disable the interrupt during read. If you disable the interrupts and keep the share variable `volatile`, it should be fine. (assuming single core) — Lundin, Mar 26 '18 at 09:11

score 4 · Accepted Answer · edited Mar 21 '18 at 16:28

Depending on your goals, i.e. whether you're writing for a specific platform only or want to ensure portability, you need to keep several additional things in mind:

With optimizations enabled, many compilers will happily reorder accesses to volatile variables with accesses to non volatile variables, as long as the final result of the operation is indistinguishable to a single-thread scenario. This means that code like this:
```
int a = 0;
volatile int b = 0;

void interrupt_a(void) 
{
    a = b + 1;
    b = 0;       // set b to zero when done
}
```
can be rearranged by the compiler into:
```
load acc from [b]
store 0 into [b]  // set b to zero *before* updating a, to mess with you a bit
add 1 to acc
store acc into [a]
```
The way to prevent an optimizing compiler from reordering would be to make both variables volatile. (Or if available, to use C11 _Atomic with memory_order_release stores and memory_order_acquire loads to order it with respect to operations on non-atomic variables.)

If you're on a multi-core uC, it can reorder memory operations, so this doesn't solve the problem, and the actual solution is to emit a fence for both the compiler and the CPU, if you care about observers on other cores (or in MMIO even on a single-core uC). Hardware fence instructions aren't needed on a single core or single thread, because even an out-of-order execution CPU sees it's own operations happen in program order.

Again, if the compiler you got with the toolchain for your particular embedded system doesn't know anything about fences, then it's quite likely it will refrain from doing stuff like this. So you need to examine the documentation and check the compiled assembly.

For example, the ARM docs state that the processor is "allowed" to reorder instructions and the programmer should take care to add memory barriers, but right after that it also states (under "implementation detail") that Cortex M processors do not reorder instructions. However, they still insist that proper barriers should be inserted, since it will simplify porting to a newer version of the processor.
Depending on your pipeline length, it might take a couple of instructions until the interrupt is fully enabled or disabled, after you make the request. Again, you need to check the docs for this particular uC/compiler, but sometimes there needs to be a fence of some sort after writing to the register. For example, on ARM Cortex you need to issue both DSB and ISB instructions after disabling the interrupt, to make sure that the interrupt doesn't enter in the next several instructions
```
// you would have to do this on an ARM Cortex uC
DisableIRQ(device_IRQn); // Disable certain interrupt by writing to NVIC_CLRENA
DSB(); // data memory barrier
ISB(); // instruction synchronization barrier

// <-- this is where the interrupt is "really disabled"
```
Of course, your library might itself contain all required fence instructions when you call disable_interrupts();, or they might not be needed at all for this architecture.
Increment operation (x++) should not be considered atomic, even it might "accidentally" turn out to be atomic on a certain single-core CPU. As you've noticed, it's not atomic on your particular uC, and the only way to guarantee atomicity is to disable interrupts around this operation.

So, ultimately, you should make sure you read the documentation for this platform and understand what the compiler can and cannot do. Something that works today might not work tomorrow if the compiler decides to reorder instructions after you've added a seemingly minor change, especially since a race condition might not be frequent enough to detect it immediately.

The OP is on a single-threaded / single-core architecture. Only compile-time reordering is a concern, because a core always observes its own operations to happen in program order. So you need a compiler barrier, but not a fence instruction (if RXv1 even has memory fence instructions). Interrupts are serializing on most architectures, so a core doesn't have to worry about runtime reordering with respect to an interrupt handler on the same core (just like a thread wrt. a signal handler). — Peter Cordes, Mar 21 '18 at 16:07
Ideally the OP's compiler would support C11 `_Atomic char`, with `memory_order_acquire` / `memory_order_release` to order non-atomic variables with respect to the flag loads/stores. — Peter Cordes, Mar 21 '18 at 16:10
@KrzysztofBracha: no problem, although from the docs I've glanced for the mentioned controller, your specific compiler doesn't seem to do any reordering, so most of my answer are general suggestions for portability since there are no instructions you can use for barriers anyway. Just keep in mind that none of these expressions are atomic, apart from single reads and writes to addresses not wider than `int` (e.g. an interrupt may enter in the middle of `if(event_request_message && !condition_sending_message)`). — vgru, Mar 26 '18 at 10:42
@Groo Still I appreciate the effort put into your answer. In the case of evaluation, it is fine to interrupt it at any point (`condition_sending_message` is a user code flag). In general, definitely the only approach to take here is to review each "shared variable" case separately if they consist of different usage/logic. This question and answers also opened a few new topics to me and the whole issue is much clearer now. Thanks again! — Krzysztof Bracha, Mar 26 '18 at 11:55
@Groo: compiler barriers against compile-time reordering don't turn into instructions. e.g. in GNU C, `asm volatile("" ::: "memory")` forces the contents of all globals to match the C abstract machine at that point (because of the `"memory"` clobber), and to assume they might have been modified, so the compiler can't reorder loads or stores in either direction across it. But the asm instructions inserted is just the empty string `""`, so it's *only* a compile-time barrier. Similarly, when compiling for x86, `std::atomic_thread_fence(memory_order_release)` takes no instructions. — Peter Cordes, Mar 26 '18 at 16:21
@PeterCordes: yes, perhaps my wording was unclear, I was referring to compiler barriers, I believe we already established that the microcontroller wouldn't reorder instructions at runtime in the previous comments. — vgru, Mar 26 '18 at 19:24
You said "there are no instructions you can use for barriers anyway". I guess you meant "intrinsics" or builtins or "C syntax", because "instructions" (in my mind) is specifically talking about asm-level stuff; i.e. the compiler ouput, not input (unless you use inline-asm). Anyway, I wondered if this was a terminology issue, since you did say the compiler happens not to reorder. (That sounds terrible, and would stop `a=1; b=2; a=3;` from optimizing into 2 stores, unless you just mean it doesn't reorder anything around / across `volatile`, so a `volatile` access is a memory barrier.) — Peter Cordes, Mar 26 '18 at 19:33
@PeterCorder: yes, intrinsics. I just glanced over the manual so I didn't want to make bold statements about how it operates. I've noticed it doesn't support C11, mentions doing minor optimizations like loop unrolling and inlining, and states that accesses to non-volatile variables can be optimized, without particular emphasis on reordering. The only way to ensure that two accesses aren't reordered would therefore be to make them both volatile, as mentioned in the answer. — vgru, Mar 26 '18 at 20:37

Is the following "flag" variable access safe between interrupt and user code?

1 Answers1

Linked