I think that in this case both volatile and atomic will most likely work in practice on the 32 bit ARM. At least in an older version of STM32 tools I saw that in fact the C atomics were implemented using volatile for small types.
Volatile will work because the compiler may not optimize away any access to the variable that appears in the code.
However, the generated code must differ for types that cannot be loaded in a single instruction. If you use a volatile int64_t
, the compiler will happily load it in two separate instructions. If the ISR runs between loading the two halves of the variable, you will load half the old value and half the new value.
Unfortunately using atomic<int64_t>
may also fail with interrupt service routines if the implementation is not lock free. For Cortex-M, 64-bit accesses are not necessarily lockfree, so atomic should not be relied on without checking the implementation. Depending on the implementation, the system might deadlock if the locking mechanism is not reentrant and the interrupt happens while the lock is held. Since C++17, this can be queried by checking atomic<T>::is_always_lock_free
. A specific answer for a specific atomic variable (this may depend on alignment) may be obtained by checking flagA.is_lock_free()
since C++11.
So longer data must be protected by a separate mechanism (for example by turning off interrupts around the access and making the variable atomic or volatile.
So the correct way is to use std::atomic
, as long as the access is lock free. If you are concerned about performance, it may pay off to select the appropriate memory order and stick to values that can be loaded in a single instruction.
Not using either would be wrong, the compiler will check the flag only once.
These functions all wait for a flag, but they get translated differently:
#include <atomic>
#include <cstdint>
using FlagT = std::int32_t;
volatile FlagT flag = 0;
void waitV()
{
while (!flag) {}
}
std::atomic<FlagT> flagA;
void waitA()
{
while(!flagA) {}
}
void waitRelaxed()
{
while(!flagA.load(std::memory_order_relaxed)) {}
}
FlagT wrongFlag;
void waitWrong()
{
while(!wrongFlag) {}
}
Using volatile you get a loop that reexamines the flag as you wanted:
waitV():
ldr r2, .L5
.L2:
ldr r3, [r2]
cmp r3, #0
beq .L2
bx lr
.L5:
.word .LANCHOR0
Atomic with the default sequentially consistent access produces synchronized access:
waitA():
push {r4, lr}
.L8:
bl __sync_synchronize
ldr r3, .L11
ldr r4, [r3, #4]
bl __sync_synchronize
cmp r4, #0
beq .L8
pop {r4}
pop {r0}
bx r0
.L11:
.word .LANCHOR0
If you do not care about the memory order you get a working loop just as with volatile:
waitRelaxed():
ldr r2, .L17
.L14:
ldr r3, [r2, #4]
cmp r3, #0
beq .L14
bx lr
.L17:
.word .LANCHOR0
Using neither volatile nor atomic will bite you with optimization enabled, as the flag is only checked once:
waitWrong():
ldr r3, .L24
ldr r3, [r3, #8]
cmp r3, #0
bne .L23
.L22: // infinite loop!
b .L22
.L23:
bx lr
.L24:
.word .LANCHOR0
flag:
flagA:
wrongFlag: