When is CLREX actually needed on ARM Cortex M7?

Question

I found a couple of places online which state that CLREX "must" be called whenever an interrupt routine is entered, which I don't understand. The docs for CLREX state (added the numbering for easier reference):

(1) Clears the local record of the executing processor that an address has had a request for an exclusive access.

(2) Use the CLREX instruction to return a closely-coupled exclusive access monitor to its open-access state. This removes the requirement for a dummy store to memory.

(3) It is implementation-defined whether CLREX also clears the global record of the executing processor that an address has had a request for an exclusive access.

I don't understand pretty much anything here.

I had the impression that writing something along the lines the example in the docs was enough to guarantee atomicity:

    MOV r1, #0x1                ; load the ‘lock taken’ value
try:                                                       <---\
    LDREX r0, [LockAddr]        ; load the lock value          |
    CMP r0, #0                  ; is the lock free?            |
    STREXEQ r0, r1, [LockAddr]  ; try and claim the lock       |
    CMPEQ r0, #0                ; did this succeed?            |
    BNE try                     ; no - try again   ------------/
    ....                        ; yes - we have the lock

Why should the "local record" need to be cleared? I thought that LDREX/STREX are enough to guarantee atomic access to an address from several interrupts? I.e. GCC for ARM compiles all C11 atomic functions using LDREX/STREX and I don't see CLREX being called anywhere.
What "requirement for a dummy store" is the second paragraph referring to?
What is the difference between the global record and a local record? Is global record needed for multi-core scenarios?

3) yes the documentation states that the global record is for multiple PE (cores). — old_timer, Jul 04 '18 at 03:35
I suspect the CLREX or a dummy store are for situations where the interrupt/exception occured between the LDREX and the STREX, and perhaps this is a task switch timer interrupt so an LDREX from one pair is now connected to an STREX from another. With these pairs being implemented in infinite loops increases the odds of hitting them, but the odds are still pretty low. — old_timer, Jul 04 '18 at 03:38
I suspect it is for cleanliness when an LDREX/STREX pair are broken by an interrupt. Both CLREX and STREX will clear the local but implementation defined for the global. One tries to do a store the other doesnt. — old_timer, Jul 04 '18 at 03:46
"Use LDREX and STREX to implement interprocess communication in multiple-processor and shared-memory systems." Do you have a multi-core cortex-m7 or one that shares its memory with another master? — old_timer, Jul 04 '18 at 03:49
@old_timer: no, it's a single-core real-time application, but with strict limits no interrupt latencies. ARMv7 places certain guarantees as long as you don't use `SWP` (and some other restrictions). *I suspect it is for cleanliness when an LDREX/STREX pair are broken by an interrupt.* - but I don't understand the reason again, the `LDREX`/`STREX` are *explicitly* created to solve the issue of different interrupts breaking in between. — Lou, Jul 04 '18 at 07:34
First off, LDREX/STREX are for multi master systems, the swp replacement is not correct, often taken out of context from arm documentation (need to read all the docs). Unimaster SWP may be your only choice. Anyway the answer you accepted was the comment I gave, mixing an ldrex with some other strex (on a uniprocessor/master system, etc, etc). — old_timer, Jul 04 '18 at 20:12
You are correct, where LDREX/STREX are useful (multi-core/master systems) that situation is not an issue you cannot mix and match pairs from different masters and have them pass. I also do not yet see a situation where CLREX nor a dummy STREX are required as those situations dont require STREX/LDREX at all. I have yet to find the dummy store requirement outside the text you found. — old_timer, Jul 04 '18 at 20:14
The cortex-m7 is an armv7-m not an armv7 BTW. With armv7 multi-core systems are quite common, cortex-m7...would like to see one, I think the chip vendor has to cobble that together. I dont have access to know what is required (include stuff, modification of the source, compile options, to get a width more than zero of the master bits on the exclusive interface). What chip are you using? (this is all very chip/vendor specific anyway) — old_timer, Jul 04 '18 at 20:19
Only exclusive instructions to shared memory result in exclusive accesses on the AHBP. Exclusive accesses to non-shared memory are marked as non-exclusive accesses on the bus. — old_timer, Jul 04 '18 at 20:25
Software must avoid performing exclusive accesses to shared regions of memory if no global exclusive monitor is implemented that covers the region in question. — old_timer, Jul 04 '18 at 20:25
From these docs it seems like global is handled by the chip vendor not ARM certainly true for the big boys (ARMv6, ARMv7). — old_timer, Jul 04 '18 at 20:26
My cortex-m7 board hangs on the STREX, have not dug in to see what kind of fault. So so far I cant do an LDREX/STREX pair (against SRAM). — old_timer, Jul 05 '18 at 15:13
@old_timer: *the swp replacement is not correct, often taken out of context from arm documentation* -- you really need to add some references to that -- all the official arm docs state that `swp` is deprecated and cannot ensure the stated interrupt latency, so I don't see what could be "taken out of context". The accepted answer states that `CLREX` is required for multithreaded scenarios where threads can be preempted and explains how this can happen even on a single-core system, while your comment said it's for "cleanliness". — Lou, Jul 06 '18 at 08:57
For example: ["The model for using Load-Exclusives and Store-Exclusives for synchronization is **the same** for single-core and multi-core systems."](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s02s01.html) ["future architectures are not guaranteed to support these instructions (SWP and SWPB)."](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s02s01.html). ["processor must complete both the load and the store part (...), increasing interrupt latency"](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s02s01.html). — Lou, Jul 06 '18 at 09:24

score 11 · Accepted Answer · answered Jul 04 '18 at 08:59

Taking (and paraphrasing) your three questions separately:

1. Why clear the access record?

When strict nesting of code is enforced, such as when you're working with interrupts, then CLREX is not usually required. However, there are cases where it's important. Imagine you're writing a context switch for a preemptive operating system kernel, which can asynchronously suspend a running task and resume another. Now consider the following pathological situation, involving two tasks of equal priority (A and B) manipulating the same shared resource using LDREX and STREX:

Task A      Task B
  ...
 LDREX
-------------------- context switch
             LDREX
             STREX   (succeeds)
              ...
             LDREX
-------------------- context switch
 STREX               (succeeds, and should not)
  ...

Therefore the context switch must issue a CLREX to avoid this.

2. What 'requirement for a dummy store' is avoided?

If there wasn't a CLREX instruction then it would be necessary to use a STREX to relinquish the exclusive-access flag, which involves a memory transaction and is therefore slower than it needs to be if all you want to do is clear the flag.

3. Is the 'global record' for multi-core scenarios?

Yes, if you're using a single-core machine, there's only one record because there's only one CPU.

There is no need to use `CLREX` for the reason you describe for (1) on Cortex-M processors, other than compatibility. Please see my answer for more details. — Graeme, Jan 21 '19 at 17:42
Also, for (2), I can see how `CLREX` saves a few cycles over `STREX`. But don't see how that can be useful if the exclusive access flag is cleared automatically on a context switch. If it returned the exclusive access state like `STREX` I can see cases it could be used, but it doesn't. — Graeme, Jan 21 '19 at 17:49

Graeme · Answer 2 · 2023-08-18T09:02:50.163

Actually CLREX isn't needed for exceptions/interrupts on the M7, it appears to only be included for compatibility reasons. From the documenation (Version c):

CLREX enables compatibility with other ARM Cortex processors that have to force the failure of the store exclusive if the exception occurs between a load exclusive instruction and the matching store exclusive instruction in a synchronization operation. In Cortex-M processors, the local exclusive access monitor clears automatically on an exception boundary, so exception handlers using CLREX are optional.

So, since Cortex-M processors clear the local exclusive access flag on exception/interrupt entry/exit, this negates most (all?) of the use cases for CLREX.

With regard to your third question, as others have mentioned you are correct in thinking that the global record is used in multi-core scenarios. There may still be use cases for CLREX on multi-core processors depending on the implementation defined effects on local/global flags.

I can see why there is confusion around this, as the initial version of the M7 documentation doesn't include these sentences (not to mention the various other versions of more generic documentation on the ARM website). Even now, I cannot even link to the latest revision. The page displays 'Version a' by default and you have to manually change the version via a drop down box (hopefully this will change in future).

Update

In response to comments, an additional documentation link for this. This is the part of the manual that describes the usage of these instructions outside of the specific instruction documentation (and also has been there since the first revision):

The processor removes its exclusive access tag if:

It executes a CLREX instruction.

It executes a STREX instruction, regardless of whether the write succeeds.

An exception occurs. This means the processor can resolve semaphore conflicts between different threads.

In a multiprocessor implementation:

Executing a CLREX instruction removes only the local exclusive access tag for the processor.

Executing a STREX instruction, or an exception, removes the local exclusive access tags for the processor.

Executing a STREX instruction to a Shareable memory region can also remove the global exclusive access tags for the processor in the system.

That is genuinely very interesting. The fact that this information is only available in one hard-to-find version of the documentation doesn't give me sufficient confidence to remove the CLREX from my context switches though! I wouldn't be surprised to find that it didn't apply to early silicon revisions, for example. — cooperised, Jan 22 '19 at 08:45
Personally I'm also surprised by this because automatic clearing of the exclusive access flag on an exception boundary doesn't seem like a very useful feature. It potentially saves a CLREX in a context switch, at the cost of needlessly causing STREX failures whenever exclusive access blocks are interrupted by ISRs that don't even touch the exclusive access mechanism. Not sure of the logic of that. — cooperised, Jan 22 '19 at 08:47
@cooperised. I added another link. This is what I was orginally aware of, but I found the other one and decided to use that instead since it mentioned the compatibility. I'm not sure the full reasons for the change, but I guess one argument for this is when it comes to the use of libraries or an RTOS that implement exceptions that you have little control over. Another factor is probably just that is less prone to user errors. — Graeme, Jan 22 '19 at 09:57
Not sure that argument is valid - if the exception handlers you don't have control over use LDREX/STREX then the flag gets cleared anyway; if they don't, there's no need to clear the flag because there's no race. Literally the only place a CLREX is required (if the flag were not cleared at the exception boundary) is in the context switch. Ah well... — cooperised, Jan 22 '19 at 13:12
@cooperised. Yeah, agreed, possible `CLREX` usage is more limited than I thought. I added the paragraph following my last quote. On the M7, it looks like only `STREX` can clear the global exclusive access tag. If `CLREX` was removing the global tag when the exception boundary isn't, then there would be a use case on multiprocessor chips. The way it is, it seems like the other processor can be left hanging by a context switch. I don't get that. — Graeme, Jan 22 '19 at 14:38

When is CLREX actually needed on ARM Cortex M7?

2 Answers2

Update

Linked