Can an x86_64 and/or armv7-m mov instruction be interrupted mid-operation?

Question

I am wondering whenever I would need to use a atomic type or volatile (or nothing special) for a interrupt counter:

uint32_t uptime = 0;

// interrupt each 1 ms
ISR()
{
    // this is the only location which writes to uptime
    ++uptime;
}

void some_func()
{
    uint32_t now = uptime;
}

I myself would think that volatile should be enough and guarantee error-free operation and consistency (incremental value until overflow).

But it has come to my mind that maybe a mov instruction could be interrupted mid-operation when moving/setting individual bits, is that possible on x86_64 and/or armv7-m?

for example the mov instruction would begin to execute, set 16 bits, then would be pre-empted, the ISR would run increasing uptime by one (and maybe changing all bits) and then the mov instruction would be continued. I cannot find any material that could assure me of the working order.

Would this also be the same on armv7-m?

Would using sig_atomic_t be the correct solution to always have an error-free and consistent result or would it be "overkill"?

For example the ARM7-M architecture specifies:

In ARMv7-M, the single-copy atomic processor accesses are:
• All byte accesses.
• All halfword accesses to halfword-aligned locations.
• All word accesses to word-aligned locations.

would a assert with &uptime % 8 == 0 be sufficient to guarantee this?

depends on alignment, you can get from one to a few bus width bits/bytes changed in a single transaction. But if you do even so much as a halfword unaligned that can result in two separate transactions. You have to look up the architecture, TRM, and possibly AMBA/AXI spec for that core. For LDM/STM they talk about it in the architectural reference manual due to the potential size of the transaction. — old_timer, Jul 16 '18 at 12:04
Also note that some cores can break up single, aligned, instructions into separate transactions, I know it is true for one, dont know about others (wont write with a length more than 128 bits, so a many word STM is converted into 4 word store then smaller as needed). That doesnt mean that those separate transactions can be interrupted BTW, but it would be worth looking into just to be safe. — old_timer, Jul 16 '18 at 12:07
Dont assume that one design of one core from one vendor has any influence on how another design from another vendor works. Likewise dont expect a high level language written decades ago to have solutions for this. — old_timer, Jul 16 '18 at 12:08
Try the sig_atomic_t, if it produces the same code without wrapping the transaction with interrupt disable/enable then it clearly isnt worth it and wont help. the compiler can only control the instructions it generates, it cannot take every single instruction/transaction it generates and provide a protection in some other code (interrupt handler) in some other file for each of those instructions...it would have to do it where you used the sig_atomic_t. — old_timer, Jul 16 '18 at 12:10
Not the single clock ones. Long ones like division can be interrupted on the Cortex cores — 0___________, Jul 16 '18 at 12:12
based on your edit, so long as this is aligned, which a modulo is overkill check &4 == 0 since a word in ARM is 32 bits, that would work. The compiler is not going to generate an unaligned 32 bit memory location for that variable as defined/used anyway, this is not x86. — old_timer, Jul 16 '18 at 12:13
sure, and there are multi-clock multiply options as well when building the core, I would assume the same for those. — old_timer, Jul 16 '18 at 12:13
*(or nothing special)* You need at least `volatile` on any architecture. — Andrew Henle, Jul 16 '18 at 12:14
When you say ARMv7t I assume you mean ARMv7-m which is a DIFFERENT architecture from ARMv7, you should read the ARMv7-m documentation as well as the cortex-m3 technical reference manual. — old_timer, Jul 16 '18 at 12:14
The volatile wont protect from an interrupt if the code modifies the variable in two places, if some_func does a read-modify-write of the variable, an interrupt can happen after the read but before the write causing the code to not work. If you take a one directional mailbox approach which you have here, one writer one reader, that reduces the risk. Being global the volatile wont make a difference the compiler has to save to memory. — old_timer, Jul 16 '18 at 12:17
@old_timer Without **at least** `volatile`, any code that, for example, reads `up_time` in a long-lived loop can have those reads replaced with a single read prior to the loop. So "nothing special" won't work. Whether or not additional protections are needed is implementation-dependent, but I'd opine they should be used anyway. — Andrew Henle, Jul 16 '18 at 12:20
@AndrewHenle cant modify my comment now...waited too long. Yes the volatile is required for loops, or other cases where multiple reads may happen, if the reader wants to poll the variable rather than a single read one time per function call, even with the now=uptime separate variable names, it can be optimized out — old_timer, Jul 16 '18 at 12:24
likewise the writer can be optimized if more than one write happens to the variable so if the foreground is talking to the isr through a global you want a volatile as well. good idea to sprinkle volatiles around for things like this, assume that just being global by the end of the function the global will be writte at least once, likewise once per function the global will be read, volatile covers the within the function accesses as well (but dont assume read-modify-writes are interrupt safe) — old_timer, Jul 16 '18 at 12:27
Reasoning about what instructions on the processor do is irrelevant unless you have guarantees from your compiler about which instructions it will use to implement code. Write source code whose semantics are defined to provide the properties you need—if you need an object to be acted on atomically, use an atomic type. — Eric Postpischil, Jul 16 '18 at 13:14
@Downvoter, is it possible that I may recieve some constructive criticism / feedback as in what can be improved to make this question better fit StackOverflow? — Gizmo, Jul 16 '18 at 16:00
Whenever data is accessed that is wider than the CPU's word size, the access is split up into multiple instructions which may be interrupted in between. `volatile` does not guarantee atomicity, it basically just disables some optimizations by the compiler. — JimmyB, Jul 17 '18 at 14:01
"would a assert with &uptime % 8 == 0 be sufficient to guarantee this?" - Probably yes. But with, for example, gcc you could also use [`uint32_t __attribute__ ((aligned (8))) uptime;`](https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes) to tell the compiler to properly align the variable. — JimmyB, Jul 17 '18 at 14:09

KamilCuk · Answer 1 · 2018-07-16T12:35:10.983

Use volatile. You compiler does not know about interrupts. It may assume, that ISR() function is never called (do you have in your code anywhere a call to ISR?). That means that uptime will never increment, that means that uptime will always be zero, that means that uint32_t now = uptime; may be safely optimized to uint32_t now = 0;. Use volatile uint32_t uptime. That way the optimizer will not optimize uptime away.
Word size. uint32_t variable has 4bytes. So on 32-bit processor it will take 1 instruction to fetch it's value, but on 8-bit processor it will take at least 4 instructions (in general). So on 32-bit processor you don't need to disable interrupt before loading the value of uptime, because interrupt routine will start executing before or after the current instruction is executed on the processor. Processor can't branch to interrupt routing mid-instruction, that's not possible. On 8-bit processor we need to disable interrupts before reading from uptime, like:

DisableInterrupts(); uint32_t now = uptime; EnableInterrupts();
C11 atomic types. I have never seen a real embedded code which uses them, still waiting, I see volatile everywhere. This is dependent on your compiler, because the compiler implements atomic types and atomic_* functions. This is compiler dependent. Are 100% sure that when reading from atomic_t variable your compiler will disable ISR() interrupt? Inspect the assembly output generated from atomic_* calls, you will know for sure. This was a good read. I would expect atomic* C11 types to work for concurrency between multiple threads, which can switch execution context anytime. Using it between interrupt and normal context may block your cpu, because once you are in IRQ you get back to normal execution only after servicing that IRQ, ie. some_func sets mutex up to read uptime, then IRQ fires up and IRQ will check in a loop if mutex is down, this will result in endless loop.
See for example HAL_GetTick() implementation, from here, removed __weak macro and substituted __IO macro by volatile, those macros are defined in cmsis file:

static volatile uint32_t uwTick;

void HAL_IncTick(void)
{
  uwTick++;
}

uint32_t HAL_GetTick(void)
{
  return uwTick;
}

Typically HAL_IncTick() is called from systick interrupt each 1ms.

CMSIS and other vendor supplied sources have some pretty bad code in them I would not use them as a reference, nor use them anyway without inspection and bug fixes. — old_timer, Jul 16 '18 at 12:29
I would also be careful about making generalizations about the size of registers and the size of the load/store instructions as well as the bus size. 32 bit registers as we very well know from the two targets mentioned do not insure a single transaction. As pointed out with "8-bit" the compiler is free to choose the transaction size, if it knows it is unaligned at compile time a nice compiler with a nice option might generate two instructions instead of one for that 32 bit transaction. — old_timer, Jul 16 '18 at 14:21
@old_timer: Even if a single instruction takes multiple bus transactions, it still can't be interrupted in the middle. Remember, we're worried about atomicity with respect to interrupts on the same core, *not* other cores running concurrently. An instruction either fully happens before an interrupt, or not at all. — Peter Cordes, Jul 16 '18 at 20:00
Arm clearly states that it has to be aligned...on the bus one write can pass another but a write to the same target/address in theory lands in the same command fifo and is processed linearly...likewise the transactions ideally are queued in the core linearly as well before they go out on the address bus. so yes one would expect them to be in order, but arm says otherwise. — old_timer, Jul 16 '18 at 20:05
and for store multiples they can be interrupted aligned or not and if/when they are broken into separate transactions that just allows a wider window to hit. this store of course wouldnt be using a store multiple... — old_timer, Jul 16 '18 at 20:06
and perhaps that is why the unaligned access is also interruptable as the transactions can take a very long time each so the allow the current transaction to complete and pass on the outstanding one(s). shouldnt be too hard to test with a particular core and system. — old_timer, Jul 16 '18 at 20:10

score 1 · Accepted Answer · answered Jul 16 '18 at 15:10

You have to read the documentation for each separate core and/or chip. x86 is a completely separate thing from ARM, and within both families each instance may vary from any other instance, can be and should expect to be completely new designs each time. Might not be but from time to time are.

Things to watch out for as noted in the comments.

typedef unsigned int uint32_t;

uint32_t uptime = 0;

void ISR ( void )
{
    ++uptime;
}
void some_func ( void )
{
    uint32_t now = uptime;
}

On my machine with the tool I am using today:

Disassembly of section .text:

00000000 <ISR>:
   0:   e59f200c    ldr r2, [pc, #12]   ; 14 <ISR+0x14>
   4:   e5923000    ldr r3, [r2]
   8:   e2833001    add r3, r3, #1
   c:   e5823000    str r3, [r2]
  10:   e12fff1e    bx  lr
  14:   00000000    andeq   r0, r0, r0

00000018 <some_func>:
  18:   e12fff1e    bx  lr

Disassembly of section .bss:

00000000 <uptime>:
   0:   00000000    andeq   r0, r0, r0

this could vary, but if you find a tool on one machine one day that builds a problem then you can assume it is a problem. So far we are actually okay. because some_func is dead code the read is optimized out.

typedef unsigned int uint32_t;

uint32_t uptime = 0;

void ISR ( void )
{
    ++uptime;
}
uint32_t some_func ( void )
{
    uint32_t now = uptime;
    return(now);
}

fixed

00000000 <ISR>:
   0:   e59f200c    ldr r2, [pc, #12]   ; 14 <ISR+0x14>
   4:   e5923000    ldr r3, [r2]
   8:   e2833001    add r3, r3, #1
   c:   e5823000    str r3, [r2]
  10:   e12fff1e    bx  lr
  14:   00000000    andeq   r0, r0, r0

00000018 <some_func>:
  18:   e59f3004    ldr r3, [pc, #4]    ; 24 <some_func+0xc>
  1c:   e5930000    ldr r0, [r3]
  20:   e12fff1e    bx  lr
  24:   00000000    andeq   r0, r0, r0

Because of cores like mips and arm tending to have data aborts by default for unaligned accesses we might assume the tool will not generate an unaligned address for such a clean definition. But if we were to talk about packed structs, that is another story you told the compiler to generate an unaligned access and it will...If you want to feel safe remember a "word" in ARM is 32 bits so you can assert address of variable AND 3.

x86 one would also assume a clean definition like that would result in an aligned variable, but x86 doesnt have the data fault issue by default and as a result compilers are a bit more free...focusing on arm as I think that is your question.

Now if I do this:

typedef unsigned int uint32_t;

uint32_t uptime = 0;

void ISR ( void )
{
    if(uptime)
    {
        uptime=uptime+1;
    }
    else
    {
        uptime=uptime+5;
    }
}
uint32_t some_func ( void )
{
    uint32_t now = uptime;
    return(now);
}

00000000 <ISR>:
   0:   e59f2014    ldr r2, [pc, #20]   ; 1c <ISR+0x1c>
   4:   e5923000    ldr r3, [r2]
   8:   e3530000    cmp r3, #0
   c:   03a03005    moveq   r3, #5
  10:   12833001    addne   r3, r3, #1
  14:   e5823000    str r3, [r2]
  18:   e12fff1e    bx  lr
  1c:   00000000    andeq   r0, r0, r0

and adding volatile

00000000 <ISR>:
   0:   e59f3018    ldr r3, [pc, #24]   ; 20 <ISR+0x20>
   4:   e5932000    ldr r2, [r3]
   8:   e3520000    cmp r2, #0
   c:   e5932000    ldr r2, [r3]
  10:   12822001    addne   r2, r2, #1
  14:   02822005    addeq   r2, r2, #5
  18:   e5832000    str r2, [r3]
  1c:   e12fff1e    bx  lr
  20:   00000000    andeq   r0, r0, r0

the two reads results in two reads. now there is a problem here if the read-modify-write can get interrupted, but we assume since this is an ISR it cant? If you were to read a 7, add a 1 then write an 8 if you were interrupted after the read by something that is also modifying uptime, that modification has limited life, its modification happens, say a 5 is written, then this ISR writes an 8 on top if it.

if a read-modify-write were in the interruptable code then the isr could get in there and it probably wouldnt work the way you wanted. This is two readers two writers you want one responsible for writing a shared resource and the others read-only. Otherwise you need a lot more work not built into the language.

Note on an arm machine:

typedef int __sig_atomic_t;
...
typedef __sig_atomic_t sig_atomic_t;

so

typedef unsigned int uint32_t;
typedef int sig_atomic_t;
volatile sig_atomic_t uptime = 0;
void ISR ( void )
{
    if(uptime)
    {
        uptime=uptime+1;
    }
    else
    {
        uptime=uptime+5;
    }
}
uint32_t some_func ( void )
{
    uint32_t now = uptime;
    return(now);
}

Isnt going to change the result. At least not on that system with that define, need to examine other C libraries and/or sandbox headers to see what they define, or if you are not careful (happens often) the wrong headers are used, the x6_64 headers are used to build arm programs with the cross compiler. seen gcc and llvm make host vs target mistakes.

going back to a concern though which based on your comments you appear to already understand

typedef unsigned int uint32_t;
uint32_t uptime = 0;
void ISR ( void )
{
    if(uptime)
    {
        uptime=uptime+1;
    }
    else
    {
        uptime=uptime+5;
    }
}
void some_func ( void )
{
    while(uptime&1) continue;
}

This was pointed out in the comments even though you have one writer and one reader

00000020 <some_func>:
  20:   e59f3018    ldr r3, [pc, #24]   ; 40 <some_func+0x20>
  24:   e5933000    ldr r3, [r3]
  28:   e2033001    and r3, r3, #1
  2c:   e3530000    cmp r3, #0
  30:   012fff1e    bxeq    lr
  34:   e3530000    cmp r3, #0
  38:   012fff1e    bxeq    lr
  3c:   eafffffa    b   2c <some_func+0xc>
  40:   00000000    andeq   r0, r0, r0

It never goes back to read the variable from memory, and unless someone corrupts the register in an event handler, this can be an infinite loop.

make uptime volatile:

00000024 <some_func>:
  24:   e59f200c    ldr r2, [pc, #12]   ; 38 <some_func+0x14>
  28:   e5923000    ldr r3, [r2]
  2c:   e3130001    tst r3, #1
  30:   012fff1e    bxeq    lr
  34:   eafffffb    b   28 <some_func+0x4>
  38:   00000000    andeq   r0, r0, r0

now the reader does a read every time.

same issue here, not in a loop, no volatile.

00000020 <some_func>:
  20:   e59f302c    ldr r3, [pc, #44]   ; 54 <some_func+0x34>
  24:   e5930000    ldr r0, [r3]
  28:   e3500005    cmp r0, #5
  2c:   0a000004    beq 44 <some_func+0x24>
  30:   e3500004    cmp r0, #4
  34:   0a000004    beq 4c <some_func+0x2c>
  38:   e3500001    cmp r0, #1
  3c:   03a00006    moveq   r0, #6
  40:   e12fff1e    bx  lr
  44:   e3a00003    mov r0, #3
  48:   e12fff1e    bx  lr
  4c:   e3a00007    mov r0, #7
  50:   e12fff1e    bx  lr
  54:   00000000    andeq   r0, r0, r0

uptime can have changed between tests. volatile fixes this.

so volatile is not the universal solution, having the variable be used for one way communication is ideal, need to communicate the other way use a separate variable, one writer one or more readers per.

you have done the right thing and consulted the documentation for your chip/core

So if aligned (in this case a 32 bit word) AND the compiler chooses the right instruction then the interrupt wont interrupt the transaction. If it is an LDM/STM though you should read the documentation (push and pop are also LDM/STM pseudo instructions) in some cores/architectures those can be interrupted and restarted as a result we are warned about those situations in arm documentation.

short answer, add volatile, and make it so there is only one writer per variable. and keep the variable aligned. (and read the docs each time you change chips/cores, and periodically disassemble to check the compiler is doing what you asked it to do). doesnt matter if it is the same core type (another cortex-m3) from the same vendor or different vendors or if it is some completely different core/chip (avr, msp430, pic, x86, mips, etc), start from zero, get the docs and read them, check the compiler output.

score 1 · Answer 3 · answered Jul 16 '18 at 21:23

TL:DR: Use volatile if an aligned uint32_t is naturally atomic (it is on x86 and ARM). Why is integer assignment on a naturally aligned variable atomic on x86?. Your code will technically have C11 undefined behaviour, but real implementations will do what you want with volatile.

Or use C11 stdatomic.h with memory_order_relaxed if you want to tell the compiler exactly what you mean. It will compile to the same asm as volatile on x86 and ARM if you use it correctly.

(But if you actually need it to run efficiently on single-core CPUs where load/store of an aligned uint32_t isn't atomic "for free", e.g. with only 8-bit registers, you might rather disable interrupts instead of having stdatomic fall back to using a lock to serialize reads and writes of your counter.)

Whole instructions are always atomic with respect to interrupts on the same core, on all CPU architectures. Partially-completed instructions are either completed or discarded (without committing their stores) before servicing an interrupt.

For a single core, CPUs always preserve the illusion of running instructions one at a time, in program order. This includes interrupts only happening on the boundaries between instructions. See @supercat's single-core answer on Can num++ be atomic for 'int num'?. If the machine has 32-bit registers, you can safely assume that a volatile uint32_t will be loaded or stored with a single instruction. As @old_timer points out, beware of unaligned packed-struct members on ARM, but unless you manually do that with __attribute__((packed)) or something, the normal ABIs on x86 and ARM ensure natural alignment.

Multiple bus transactions from a single instruction for unaligned operands or narrow busses only matters for concurrent read+write, either from another core or a non-CPU hardware device. (e.g. if you're storing to device memory).

Some long-running x86 instructions like rep movs or vpgatherdd have well-defined ways to partially complete on exceptions or interrupts: update registers so re-running the instruction does the right thing. But other than that, an instruction has either run or it hasn't, even a "complex" instruction like a memory-destination add that does a read/modify/write.) IDK if anyone's ever proposed a CPU that could suspend/result multi-step instructions across interrupts instead of cancelling them, but x86 and ARM are definitely not like that. There are lots of weird ideas in computer-architecture research papers. But it seems unlikely that it would be worth the keeping all the necessary microarchitectural state to resume in the middle of a partially-executed instruction instead of just re-decoding it after returning from an interrupt.

This is why AVX2 / AVX512 gathers always need a gather mask even when you want to gather all the elements, and why they destroy the mask (so you have to reset it to all-ones again before the next gather).

In your case, you only need the store (and load outside the ISR) to be atomic. You don't need the whole ++uptime to be atomic. You can express this with C11 stdatomic like this:

#include <stdint.h>
#include <stdatomic.h>

_Atomic uint32_t uptime = 0;

// interrupt each 1 ms
void ISR()
{
    // this is the only location which writes to uptime
    uint32_t tmp = atomic_load_explicit(&uptime, memory_order_relaxed);
    // the load doesn't even need to be atomic, but relaxed atomic is as cheap as volatile on machines with wide-enough loads
    atomic_store_explicit(&uptime, tmp+1, memory_order_relaxed);

    // some x86 compilers may fail to optimize to  add dword [uptime],1
    // but uptime+=1 would compile to  LOCK ADD (an atomic increment), which you don't want.
}

// MODIFIED: return the load result
uint32_t some_func()
{
    // this does need to be an atomic load
    // you typically get that by default with volatile, too
    uint32_t now = atomic_load_explicit(&uptime, memory_order_relaxed);
    return now;
}

volatile uint32_t compiles to the exact same asm on x86 and ARM. I put the code on the Godbolt compiler explorer. This is what clang6.0 -O3 does for x86-64. (With -mtune=bdver2, it uses inc instead of add, but it knows that memory-destination inc is one of the few cases where inc is still worse than add on Intel :)

ISR:                                    # @ISR
    add     dword ptr [rip + uptime], 1
    ret
some_func:                              # @some_func
    mov     eax, dword ptr [rip + uptime]
    ret
inc_volatile:           //  void func(){ volatile_var++; }
    add     dword ptr [rip + volatile_var], 1
    ret

gcc uses separate load/store instructions for both volatile and _Atomic, unfortunately.

    # gcc8.1 -O3 
    mov     eax, DWORD PTR uptime[rip]
    add     eax, 1
    mov     DWORD PTR uptime[rip], eax

At least that means there's no downside to using _Atomic or volatile _Atomic on either gcc or clang.

Plain uint32_t without either qualifier is not a real option, at least not for the read side. You probably don't want the compiler to hoist get_time() out of a loop and use the same time for every iteration. In cases where you do want that, you could copy it to a local. That could result in extra work for no benefit if the compiler doesn't keep it in a register, though (e.g. across function calls it's easiest for the compiler to just reload from static storage). On ARM, though, copying to a local may actually help because then it can reference it relative to the stack pointer instead of needing to keep a static address in another register, or regenerate the address. (x86 can load from static addresses with a single large instruction, thanks to its variable-length instruction set.)

If you want any stronger memory-ordering, you can use atomic_signal_fence(memory_order_release); or whatever (signal_fence not thread_fence) to tell the compiler you only care about ordering wrt. code running asynchronously on the same CPU ("in the same thread" like a signal handler), so it will only have to block compile-time reordering, not emit any memory-barrier instructions like ARM dmb.

e.g. in the ISR:

  uint32_t tmp = atomic_load_explicit(&idx,  memory_order_relaxed);
  tmp++;
  shared_buf[tmp] = 2;   // non-atomic
                         // Then do a release-store of the index
  atomic_signal_fence(memory_order_release);
  atomic_load_explicit(&idx, tmp, memory_order_relaxed);

Then it's safe for a reader to load idx, run atomic_signal_fence(memory_order_acquire);, and read from shared_buf[tmp] even if shared_buf is not _Atomic. (Assuming you took care of wraparound issues and so on.)

score 0 · Answer 4 · answered Jul 16 '18 at 13:23

volatile is only sugestion for compiler, where value should be stored. typically with this flat this is stored in any CPU register. But if compiler will not take this space because it is busy for other operation, it will be ignored and traditionally stored in memory. this is the main rule.

then let's look at the architecture. all native CPU instruction with all native types are atomic. But many operation can be splited into two steps, when value should be copied from memory to memory. in that situation can be done some cpu interrupt. but don't worry, it is normal. when value will not be stored into prepared variable, you can understand this as not fully commited operation.

problem is when you use words longer than implemented in CPU, for example u32bit in 16 or 8 bit processor. In that situation reading and writting value will be splited into many steps. then it will be sure, then some part of value will be stored, other not, and you will get wrong damaged value.

in this scenario it is not allways good aproach for disabling interrupts, because this can take big time. of course you can use locking, but this can do the same. but you can make some structure, with first field as data, and second field as counter that suit in architecture. then when you reading that value, you can at first get counter as first value, then get value, and at last get counter second time. when counter differs, you should repeat this process. of course it doesn't guarantee all will be proper, but typically it saves a lot of cpu cycles. for example you will use 16bit additional counter for verification, it is 65536 values. then when you read this second counter first time, you main process must be frozen for very long cycles, in this example it should be 65536 missed interrupts, for making bug for main counter or any other stored value.

of course if you using 32bit value in 32bit architecture, it is not a problem, you don't need specially secure that operation, independed or architecture. of course except if architecture do all its operation as atomic :)

example code:

struct
{
  ucint32_t value; //us important value
  int watchdog;  //for value secure, long platform depended, usually at least 32bits
} SecuredCounter;

ISR()
{
  // this is the only location which writes to uptime
  ++SecuredCounter.value;
  ++SecuredCounter.watchdog;
}

void some_func()
{
    uint32_t now = Read_uptime;
}

ucint32_t Read_uptime;
{
    int secure1; //length platform dependee
    ucint32_t value;
    int secure2;
    while (1) {
        longint secure1=SecuredCounter.watchdog;  //read first
        ucint32_t value=SecuredCounter.value;     //read value
        longint secure2=SecuredCounter.watchdog;  //read second, should be as first
        if (secure1==secure2) return value; //this is copied and should be proper
    };
};

Different approach is to make two identical counters, you should increase it both in single function. In read function you copy both values to local variables, and compare it is identical. If is, then value is proper and return single one. If differs, repeat reading. Don't worry, if values differs, then you reading function has been interrupted. It is very fiew chance, after repeated reading it will happen again. But if it will happen, it is no chance it will be stalled loop.

Can an x86_64 and/or armv7-m mov instruction be interrupted mid-operation?

4 Answers4