LDREX/STREX with Cortex M3 and M4

Question

I was reading up on the LDREX and STREX to implement mutexes. From looking at the ARM reference manual:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100166_0001_00_en/ric1417175928887.html

It appears that LDREX/STREX only store address granularity is the whole memory space, hence you are only allowed to use LDREX/STREX on at maximum one 32bit register.

Is this correct or am I missing something? If so it kind of makes the LDREX/STREX very limited. I mean you could do a bit mapped mutex and maybe get 32 mutexes.

Does anyone use the LDREX/STREX on a M3 or M4 and if so how do they use it?

You can only have one `ldrex/strex` chain going on (per processor core) at the same time **regardless** of the reservation granularity. The point is that you cannot have *other* memory accesses, even to wildly different addresses, between the `ldrex` and `strex`. — EOF, Jun 26 '18 at 16:06
The purpose of LDREX/STREX is to share resources in a multiprocessor solution against shared resources. They have been incorrectly used long enough to have them in a single core microcontroller. And you are at the mercy of the chip vendor as to if they bothered to correctly support them anyway, and their use at all or support at all in the instruction set makes no sense whatsoever. So after all of that if your device (m4) has a cache implemented by ARM then you have some chance of knowing if the instruction actually works and how, other than that you are at the mercy of the chip vendor... — old_timer, Jun 26 '18 at 17:20
as to whether they supported it correctly, with what granularity etc. As far as how they work look at the arm docs, the trm, the arm, and the amba/ahb documentation, the axi/amba/ahb will assert the exclusive access bits on the bus and depending on those settings for that transaction the target responds with OKAY or EXOKAY, but because of the misuse of the ldrex/strex instructions in full sized arms it would not surprise me if vendors simply respond with EXOKAY for all exclusive accesses to prevent software folks misusing the instructions for complaining, and would not be surprised to see... — old_timer, Jun 26 '18 at 17:23
The purpose of the instructions are to be used as pairs in a multi-core environment, for single core for axi support is not required by vendors (so returning OKAY which is a fail to the instruction is acceptable per arms documentation). One of the processor cores does an ldrex at an address then an strex at that location, if there has been no other access to that location between these transactions, then the store happens, if there has been another core come in and access that address then it fails, and you try again. — old_timer, Jun 26 '18 at 17:28
if either core has their cache on then the transaction can be trapped in the local cache for that core and they get success because the other core(s) cannot access through their L1 so that wont fail. Not really useful to use it that way but that is arm logic and that is how it works (the L1 is arm logic, L2 and beyond is up to the chip vendor). Now extend all of this to thumb2 extensions supported by both full sized and mcu sized cores, as well as how does this make any sense? — old_timer, Jun 26 '18 at 17:31
You need to find what if any unique bus identifier you can create within a single core mcu, for competing threads to gain exclusive access to a memory location, and then has the chip vendor correctly implemented the exclusive access check. — old_timer, Jun 26 '18 at 17:32
right and then there is the case of did they implement these instructions simply because it was part of a shared instruction set and for the mcus they dont really work. Just like a number of cores support the WFI instruction, but a number of them simply treat it like a nop. I think that may be what the quote in the arm docs is telling you...Need to read up on the ahb busses in particular used in the mcus, (VS axi used in the big boys) — old_timer, Jun 26 '18 at 17:34
@old_timer: have you got any references to support these claims? I.e. *"because of the misuse of the ldrex/strex instructions in full sized arms it would not surprise me if vendors simply respond with EXOKAY for all exclusive accesses to prevent software folks misusing the instructions for complaining"* -- why on earth would vendors prefer to sell noncompliant ARM multiprocessors, because somebody might "misuse" an instruction? — Lou, Jul 03 '18 at 20:27
Yep, straight out of the arm documentation and then seeing what linux and other sources do instead of reading the arm documentation. And it is compliant for a uniprocessor to simply return EXOKAY (or just return OKAY, but folks consider that to be a chip bug when it isnt, causing harm to that product line). — old_timer, Jul 04 '18 at 00:35
Arm itself as documented doesnt even support all of these instructions all of the time. They make them work so that software doesnt break in the sense that you dont get stuck in an strex/ldrex infinite loop. Gotta read the docs. — old_timer, Jul 04 '18 at 00:37
Linux contributors are the worst about not reading. Every new release requires a lot of clean up of new and old improperly applied errata and bad assumptions (like assuming ldrex/strex on a uniprocessor is expected to work). I get that arm is hard because there are so many options and variations and nuances and multiply that by the vendors side of it each time they use a core, so rather than make assumptions, go read up, then test to confirm, or better yet dont use the feature if there is some doubt. — old_timer, Jul 04 '18 at 00:41
@old_timer I haven't read anything that supports your claim that: "The purpose of LDREX/STREX is to share resources in a multiprocessor solution against shared resources. They have been incorrectly used long enough to have them in a single core microcontroller." Everything I've read suggests that the feature is intended for shared resource on both single and multi processor environments. Feel free to provide a reference, otherwise I suggest other readers parse that as your opinion and not necessarily fact. — biscuits, Jan 07 '19 at 20:48
Please also see this from ARM: "These instructions are also useful on a single master system to implement mutexes, semaphores, etc. without needing to disable interrupts. In the same way, they are also useful for multi-threaded systems." http://infocenter.arm.com/help/topic/com.arm.doc.faqs/ka4175.html — biscuits, Jan 07 '19 at 20:50
@biscuits depends on the version of the core/bus spec, when created was clearly documented that uni-processor implementations were not required to support exclusive access. Which basically means they were for multi-processor implementations. at that time swp was being phased out. Unfortunately do to incorrect use in places like Linux (very often gets it wrong and you constantly have to repair and port to move from one chip to another), this may have changed, in more recent specs, so you have to know specifically which core and as a result which bus spec and... — old_timer, Jan 07 '19 at 22:56
...how implemented by the chip vendor for that core/bus/implementation. — old_timer, Jan 07 '19 at 22:56
it continues to state that slaves do not have to support exclusive access and can return OKAY. which would break code that does not understand how ldrex/strex work and when and where they can be used. — old_timer, Jan 07 '19 at 22:58

score 9 · Answer 1 · answered Jun 28 '18 at 16:27

So I contacted ARM and got some more information. For example if you did this it LDREX/STREX would fail:

LDREX address1

LDREX address2

STREX address1

The STREX to address1 would pass even though the last LDREX was not for address1. This is correct as that the LDREX/STREX address resolution is the entire memory space.

So I was worried that if you have a two tasks: and the first one got interrupted after the first LDREX, and then the second task got interrupted after the second LDREX to address2 and then the first task got processor back and tried the STREX it would cause a problem. However it appears that ARM issues CLREX on every exception/interrupt entry and exit. Therefore the STREX would fail as that the tasks had to be preemptive by an interrupt. That is if any interrupt occurs between LDREX and STREX the STREX will fail. So you want to keep the code as small as possible between LDREX and STREX to reduce the chances of interrupt. Additionally if the STREX fails you most likely want to try the LDREX/STREX process once or twice more before giving up.

Again this is for a single core M3/M4/M7.

Note the only place I found the reference to the CLREX being cleared with exception was in the ArmV7-M Architecture Reference Manual in section A3.4.4 Context switch support. This document is much better than anything I found online describing how the LDREX/STREX actually works.

+1, thank you for clarifying that CLREX (or equivalent) is automatically called on exception entry *and* exit. This matches what I was seeing in practice on an M4, but until I found your answer, I couldn't find any other documentation confirming that behavior. — John Lindgren, Jul 30 '18 at 20:04

LDREX/STREX with Cortex M3 and M4

1 Answers1

Linked