24

Here are the data types on STM32 microcontrollers: http://www.keil.com/support/man/docs/armcc/armcc_chr1359125009502.htm.

These microcontrollers use 32-bit ARM core processors.

Which data types have automatic atomic read and atomic write access?

I'm pretty sure all 32-bit data types do (since the processor is 32-bits), and all 64-bit data types do NOT (since it would take at least 2 processor operations to read or write a 64-bit word), but what about bool (1 byte), and uint16_t/int16_t (2 bytes)?

Context: I'm sharing variables between multiple threads (single core, but multiple threads, or "tasks" as they are called, in FreeRTOS) on the STM32 and need to know if I need to enforce atomic access by turning off interrupts, using mutexes, etc.

UPDATE:

Refering to this sample code:

volatile bool shared_bool;
volatile uint8_t shared u8;
volatile uint16_t shared_u16;
volatile uint32_t shared_u32;
volatile uint64_t shared_u64;
volatile float shared_f; // 32-bits
volatile double shared_d; // 64-bits

// Task (thread) 1
while (true)
{
    // Write to the values in this thread.
    //
    // What I write to each variable will vary. Since other threads are reading
    // these values, I need to ensure my *writes* are atomic, or else I must
    // use a mutex to prevent another thread from reading a variable in the
    // middle of this thread's writing.
    shared_bool = true;
    shared_u8 = 129;
    shared_u16 = 10108;
    shared_u32 = 130890;
    shared_f = 1083.108;
    shared_d = 382.10830;
}

// Task (thread) 2
while (true)
{
    // Read from the values in this thread.
    //
    // What thread 1 writes into these values can change at any time, so I need
    // to ensure my *reads* are atomic, or else I'll need to use a mutex to
    // prevent the other thread from writing to a variable in the midst of
    // reading it in this thread.
    if (shared_bool == whatever)
    {
        // do something
    }
    if (shared_u8 == whatever)
    {
        // do something
    }
    if (shared_u16 == whatever)
    {
        // do something
    }
    if (shared_u32 == whatever)
    {
        // do something
    }
    if (shared_u64 == whatever)
    {
        // do something
    }
    if (shared_f == whatever)
    {
        // do something
    }
    if (shared_d == whatever)
    {
        // do something
    }
}

In the code above, which variables can I do this for without using a mutex? My suspicion is as follows:

  1. volatile bool: safe--no mutex required
  2. volatile uint8_t: safe--no mutex required
  3. volatile uint16_t: safe--no mutex required
  4. volatile uint32_t: safe--no mutex required
  5. volatile uint64_t: UNSAFE--YOU MUST USE A Critical section or MUTEX!
  6. volatile float: safe--no mutex required
  7. volatile double: UNSAFE--YOU MUST USE A Critical section or MUTEX!

Example critical section with FreeRTOS:

Related, but not answering my question:

  1. Atomic operations in ARM
  2. ARM: Is writing/reading from int atomic?
  3. (My own question and answer on atomicity in 8-bit AVR [and Arduino] microcontrollers): https://stackoverflow.com/a/39693278/4561887
  4. https://stm32f4-discovery.net/2015/06/how-to-properly-enabledisable-interrupts-in-arm-cortex-m/
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • This would be what the ARM instruction set manual for your particular chip is for? – user268396 Oct 12 '18 at 18:01
  • Possible duplicate of [ARM: Is writing/reading from int atomic?](https://stackoverflow.com/questions/9399026/arm-is-writing-reading-from-int-atomic) – zneak Oct 12 '18 at 18:28
  • You have to look at the assembly code. – Fiddling Bits Oct 12 '18 at 18:35
  • 1
    Are you trying to defend against two cores operating on the same data, or being interrupted in the middle of a write to yield to the other thread on the same core? – zneak Oct 12 '18 at 18:39
  • The latter: "being interrupted in the middle of a write to yield to the other thread on the same core". – Gabriel Staples Oct 12 '18 at 18:44
  • Consider it to be code running on an [STM32F767ZI](https://www.st.com/en/microcontrollers/stm32f767zi.html) with FreeRTOS to handle multi-threading. – Gabriel Staples Oct 12 '18 at 18:47

3 Answers3

23

For the final, definitive answer to this question, jump straight down to the section below titled "Final answer to my question".

UPDATE 30 Oct. 2018: I was accidentally referencing the (slightly) wrong documents (but which said the exact same thing), so I've fixed them in my answer here. See "Notes about the 30 Oct. 2018 changes" at bottom of this answer for details.

I definitely don't understand every word here, but the ARM v7-M Architecture Reference Manual (Online source; PDF file direct download) (NOT the Technical Reference Manual [TRM], since it doesn't discuss atomicity) validates my assumptions:

enter image description here

So...I think my 7 assumptions at the bottom of my question are all correct. [30 Oct. 2018: Yes, that is correct. See below for details.]


UPDATE 29 Oct. 2018:

One more little tidbit:

Richard Barry, FreeRTOS founder, expert, and core developer, states in tasks.c...

/* A critical section is not required because the variables are of type BaseType_t. */

...when reading an "unsigned long" (4-byte) volatile variable on STM32. This means that he, at least, is 100% sure 4-byte reads and writes are atomic on STM32. He doesn't mention smaller-byte reads, but for 4-byte reads he is conclusively sure. I have to assume that 4-byte variables being the native processor width, and also, word-aligned, is critical to this being true.

From tasks.c, lines 2173-2178 in FreeRTOS v9.0.0, for instance:

UBaseType_t uxTaskGetNumberOfTasks( void )
{
    /* A critical section is not required because the variables are of type
    BaseType_t. */
    return uxCurrentNumberOfTasks;
}

He uses this exact phrase of...

/* A critical section is not required because the variables are of type BaseType_t. */

...in two different locations in this file.

Final answer to my question: all types <= 4 bytes (all bolded types in the list of 9 rows below) are atomic.

Furthermore, upon closer inspection of the TRM on p141 as shown in my screenshot above, the key sentences I'd like to point out are:

In ARMv7-M, the single-copy atomic processor accesses are:
• all byte accesses.
• all halfword accesses to halfword-aligned locations.
• all word accesses to word-aligned locations.

And, per this link, the following is true for "basic data types implemented in ARM C and C++" (ie: on STM32):

  1. bool/_Bool is "byte-aligned" (1-byte-aligned)
  2. int8_t/uint8_t is "byte-aligned" (1-byte-aligned)
  3. int16_t/uint16_t is "halfword-aligned" (2-byte-aligned)
  4. int32_t/uint32_t is "word-aligned" (4-byte-aligned)
  5. int64_t/uint64_t is "doubleword-aligned" (8-byte-aligned) <-- NOT GUARANTEED ATOMIC
  6. float is "word-aligned" (4-byte-aligned)
  7. double is "doubleword-aligned" (8-byte-aligned) <-- NOT GUARANTEED ATOMIC
  8. long double is "doubleword-aligned" (8-byte-aligned) <-- NOT GUARANTEED ATOMIC
  9. all pointers are "word-aligned" (4-byte-aligned)

This means that I now have and understand the evidence I need to conclusively state that all bolded rows just above have automatic atomic read and write access (but NOT increment/decrement of course, which is multiple operations). This is the final answer to my question. The only exception to this atomicity might be in packed structs I think, in which case these otherwise-naturally-aligned data types may not be naturally aligned.

Also note that when reading the Technical Reference Manual, "single-copy atomicity" apparently just means "single-core-CPU atomicity", or "atomicity on a single-CPU-core architecture." This is in contrast to "multi-copy atomicity", which refers to a "mutliprocessing system", or multi-core-CPU architecture. Wikipedia states "multiprocessing is the use of two or more central processing units (CPUs) within a single computer system" (https://en.wikipedia.org/wiki/Multiprocessing).

My architecture in question, STM32F767ZI (with ARM Cortex-M7 core), is a single-core architecture, so apparently "single-copy atomicity", as I've quoted above from the TRM, applies.

Further Reading:

Notes about the 30 Oct. 2018 changes:

To create atomic access guards (usually by turning off interrupts when reads and writes are not atomic) see:

  1. [my Q&A] What are the various ways to disable and re-enable interrupts in STM32 microcontrollers in order to implement atomic access guards?
  2. My doAtomicRead() func here which can do atomic reads withOUT turning off interrupts
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
  • 1
    In a single-core environment, execution can't be interrupted in the middle of an instruction, so any C construct that builds down to one instruction is atomic. 32-bit ARM doesn't have single instructions that can manipulate more than 32 bits of memory at once, so that sets an obvious upper bound on what can be atomic: notably, 64-bit manipulations can't. – zneak Oct 12 '18 at 19:37
  • 2
    However, there are still C operations that will compile to more than one instruction even if they manipulate 32 bits or less (like `a += 1` with `int a`), and you need to be careful with these. A less obvious example is if you use a structure with unaligned fields: your compiler will need to generate at least two loads and two stores to handle reading/writing them. It would also be possible that copying a struct that fits in 32 bits could use more than one instruction at some optimization levels. For numeric variables, neither is usually a concern, though. – zneak Oct 12 '18 at 19:39
  • @zneak wrong. Some of the instructions can be interrupted for example division. – 0___________ Oct 13 '18 at 14:28
  • 1
    @P__J__, teach me something and show me an architecture that does that. – zneak Oct 13 '18 at 14:36
  • 1
    @P__J__, what exactly happens when a division is interrupted? Do you get corrupted state, or is state rolled back such that being interrupted in the middle is completely indistinguishable from being interrupted just before? – zneak Oct 13 '18 at 17:55
  • TRM - Technical Reference Manual – 0___________ Oct 13 '18 at 17:58
  • the stm32 is most definitely not an ARMv7-AR...you are looking at the wrong manual. – old_timer Oct 30 '18 at 17:43
  • @old_timer, that's probably the most useful feedback I've received on this question thus far. :) Thank you for pointing that out. I'm going to see if I can find the right TRM now. It looks like you downvoted my question. Please explain why. – Gabriel Staples Oct 30 '18 at 18:04
  • That's the most unhelpful thing I've ever heard. Sounds like it's coming from an old-timer. There are something like 6000+ pgs of documentation for this chip. This is *exactly* what Stack Overflow is for. I'm not afraid to read a manual, but it's views like yours that make Stack Overflow an elitist place instead of a place where valuable and hard-to-find knowledge can be passed on. When I am an old-timer someday, and somone puts effort into this like I have, I will give them a link to a manual, provide a helpful response, and upvote their thoughtful question. – Gabriel Staples Oct 30 '18 at 18:33
  • @old_timer, I've updated my answer with the proper links to the correct Technical Reference Manual (which, it turns out, I don't need in this case), and the correct Architecture Reference Manual. I hope you reconsider your votes on this question and answer, and in the future, vote based on *correctness*, not on ability to decipher dozens of manuals, knowledge of which manuals exist, and knowledge of where to read in the 6000~8000 pgs of cryptic manuals. Prior to asking this question I was neither aware of ARM TRMs nor Architecture Manuals, & I had already downloaded 6000 pgs of STM32 manuals. – Gabriel Staples Oct 30 '18 at 20:01
  • Links are bad in both (stackoverflow) questions and answers as they change over time relative to the question or answer. – old_timer Oct 30 '18 at 20:55
  • as you can see from a simple mouser search or st or other the stm32 family covers from the cortex-m0,m0+,m3,m4,m7 and soon m23 and on and on...The m0 and m0+ are armv6-m based which you know from the documentation for the part you are using if you have one of them and the cortex-m3/m4/m7 are armv7-m based as you know from the documentation from st on the part you are using (should never start without the documentation for the part). This is advice if I simply gave you a list of instructions thats like giving you a fish without teaching how to catch one. – old_timer Oct 30 '18 at 20:59
  • so chip docs usually two minimum with various names based on the vendor, datasheet usually has at least the electrical and pinout, sometimes has the programmer info as well. sometimes others are called reference manuals or users guides. arm based parts like the huge stm32 family will tell you what core is used, you go to arms website or sometimes at st, and get the trm, in that it tells you which architecture and you get that document, bare minimum set of documents before day one of programming one of these boards. – old_timer Oct 30 '18 at 21:01
  • 99.9999% of bare metal programming is reading documentation, if you want to do this work then 6000 pages is nothing you just learn to search through it and narrow in on what you are after, sometimes that is faster than just reading it..(usually for well written manuals) – old_timer Oct 30 '18 at 21:02
  • Lastly assume that no processor or no modern processor has atomic access. Then if you find one then good for you. Also this is purchased IP, so most of the logic you are asking about is not arms it is ST or other purchased IP they used to interface to the arm's busses. The arm bus documentation is on arms website look for axi/amba/ahb, the trm should hopefully say which flavor in a vague way but the busses are mostly the same in concept, send out an address wait for that to be acked then either data comes back eventually or you then write on the write bus – old_timer Oct 30 '18 at 21:06
  • the memories, flash, peripherals are all chip vendor not arm, that doesnt mean arm cant put something into their logic to isolate transactions, and some have a feature for this usually for bit modification in a port for example. But this is not available in all cores and the chip vendor can choose to not enable this feature. At least the feature I am talking about which is not the one you found. – old_timer Oct 30 '18 at 21:07
  • There is no reason to expect there to be a global answer to your question that covers such a broad range of different products that span what a decade? The specific chip should have been part of the original question. – old_timer Oct 30 '18 at 21:10
  • lastly the chip vendors get the source code to the core, so in addition to the features that are documented as options for that core, the chip vendors may or may not make modifications. At the end of the day it comes down to what do you think you need atomic functions for and maybe you dont. (note ldrex/strex are not swp replacements, be very careful reading on how to use them, in this day and age atomic operations are bad design (performance and other negative affects.), you solve the problem other ways). – old_timer Oct 30 '18 at 21:15
  • sadly it is rare that the chip vendors tell you specifically which version of the core they used as you want the right rev of documentation as well as newer revs to compare with. With most of these cortex-ms there are cpuid registers that along with the arm documentation can tell you which core is really there, now what features the vendor compiled in are not necessarily detectable. – old_timer Oct 30 '18 at 21:16
  • at the end of the day though your question sounds like a freertos question not an arm/processor question. – old_timer Oct 30 '18 at 21:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/182968/discussion-between-gabriel-staples-and-old-timer). – Gabriel Staples Nov 01 '18 at 23:31
  • Despite the confusing name, "single-copy atomicity" really does mean it's atomic across all cores: when you store, a load on any other core will return either the old or the new value, nothing else. I don't know how they came up with that term; I haven't seen it elsewhere. It's in contrast to "multi-copy atomicity" which would be more or less a global total order on all stores, and which they apparently discuss in the manual only for the purpose of saying "we don't do that". – Nate Eldredge Mar 28 '22 at 03:41
  • Thanks for your thorough explanation! Is it the case too with bitfields? Imagine I wanna set one or several bits from my bitfield, is the operation guaranteed to be atomic if the bitfield is aligned and of correct dimension? – Getter Apr 04 '23 at 10:01
  • @Getter, I don't know. I've never really used bitfields. I just use regular types in structs. If I want to toggle bits I just use macros which do bitshifting and stuff, like `bitRead()`, `bitSet()`, `bitClear()`, and `bitWrite()`, [shown in my answer here](https://stackoverflow.com/a/54798443/4561887). – Gabriel Staples Apr 04 '23 at 16:55
  • @Getter...and those operations are not atomic. They must be protected with atomic access guards, just like increment (`++`) and decrement (`--`) operations, unless you use the [C `_Atomic` types](https://en.cppreference.com/w/c/thread) or C++ [`std::atomic<>` types](https://en.cppreference.com/w/cpp/atomic/atomic), which make increment and decrement atomic. In C++, `std::atomic<>` types also have atomic `|=` and `&=` operations (see [here](https://en.cppreference.com/w/cpp/atomic/atomic)), but I'm not sure about that in C. – Gabriel Staples Apr 04 '23 at 17:21
  • (Update: for C it may depend on the compiler): "Implementations are recommended to ensure that the representation of `_Atomic(T)` in C is same as that of `std::atomic` in C++ for every possible type `T`. The mechanisms used to ensure atomicity and memory ordering should be compatible." (see: https://en.cppreference.com/w/cpp/atomic/atomic) – Gabriel Staples Apr 04 '23 at 17:27
  • Thanks a lot GabrielStaples! – Getter Apr 05 '23 at 06:27
3

Depending what you mean by atomic.

If it is not the simple load or store operation like

a += 1;

then all types are not atomic.

If it is simple store or load oparation 32bits, 16 bits and 8 bits data types are atomic. If the value in the register will have to be normalized 8 & 16 bits store and load may be not atomic.

If your hardware supports bitbanding then if the bitbanding is used the bit operations (set and reset)int the memory areas supporting bitbanding are atomic

Note.

if your code does not allow unaligned operations 8 & 16 bit operations may be not atomic.

Community
  • 1
  • 1
0___________
  • 60,014
  • 4
  • 34
  • 74
  • Thanks for your answer. Please see mu updated question and see if you can verify my suspicions more explicitly. – Gabriel Staples Oct 12 '18 at 18:24
  • 4
    Incidentally, `a += 1` is two operations and is not atomic. – zneak Oct 12 '18 at 18:28
  • 1
    Agreed. I learned this the hard way a few years back on an 8-bit AVR processor by incrementing (not an atomic operation) an otherwise atomic-read-write-capable 8-bit variable. – Gabriel Staples Oct 12 '18 at 18:29
  • @zneak no, only if the operation is RMW, otherwise it is atomic. It may be not coherent (cache) but atomic. – 0___________ Oct 12 '18 at 18:30
  • @zneak Incidentially a+=1 is at least three operations not two. – 0___________ Oct 12 '18 at 18:34
  • 1
    If that operation was atomic, multiple cores attempting it at the same time would succeed. That's not the case, if you have two cores doing this in a loop you are certain to lose some increments. – zneak Oct 12 '18 at 18:36
  • In addition to that, it's impossible to have unaligned 8-bit accesses on ARM. – zneak Oct 12 '18 at 18:36
  • As @zneak says, `a += 1` is *not* atomic. Here's my previous experience with that one: https://stackoverflow.com/questions/36381932/c-decrementing-an-element-of-a-single-byte-volatile-array-is-not-atomic-why – Gabriel Staples Oct 12 '18 at 18:40
  • Incrementing/decrementing is *never* atomic: https://stackoverflow.com/a/36381968/4561887 – Gabriel Staples Oct 12 '18 at 18:43
  • @zneak they are atomic. The other core has to wait for the access. The problem is coherence as the cores work on the cached data. This is another problem and another measures have to be taken. But it is outside the scope of this question – 0___________ Oct 12 '18 at 19:00
  • @P__J__, feel free to expand your answer to be "outside the scope of the question," as you see it. The more knowledge you can provide, the better. – Gabriel Staples Oct 12 '18 at 22:07
0

Atomic "arithmetic" can be processed by CPU Core registers!

It can be any types one or four bytes depends on architecture and instruction set

BUT modification of any variable located in memory take at least 3 system steps: RMW = Read memory to register, Modify register and Write register to memory.

Therefore atomic modification can possible only if you control using of CPU registers it does means need use pure assembler and don't use C or Cpp compiler.

When you use C\Cpp compiler it placed global or global static variable in memory so C\Cpp don't provide any atomic actions and types

Note: you can use for example "FPU registers" for atomic modification (if you really need it), but you must hide from the compiler and RTOS that architecture has FPU.

denis krasutski
  • 618
  • 4
  • 9