3

I am working with a multithreaded bare-metal C/Assembler application on a Cortex-A9.

I have some shared variables, i.e. adresses that are used from more than one thread. To perform an atomic exchange of a variables value I use LDRX and STRX. Now my question is if I need LDRX and STRX on every access to one of this variables even if interrupts are disabled.

Assume the following example:

  • Thread 1 uses LDRX and STRX to exchange the value of address a.
  • Thread 2 disables interrupts, uses normal LDR and STR to exchange the value of address a, does something else that should not be interrupted and then enables interrupts again.

What happens if Thread 1 gets interrupted right after the LDRX by Thread 2? Does the STRX in Thread 1 still recognize, that there was an access on address a or do I have to use LDRX and STRX in Thread 2, too?

unwind
  • 391,730
  • 64
  • 469
  • 606
user3035952
  • 301
  • 5
  • 12
  • If your *mainline* uses `LDRX/STRX`, then you must do the same in the interrupts. The `LDRX` reserves the memory location. In order for `STRX` to signal a retry, everyone using the memory must use `LDRX`; you can not mix and match the access. `MRC p15, 0, , c0, c0, 1` returns the ERG, which is the size that `LDRX/STRX` reserves. Read about [exclusive montiors](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJAGCFAF.html). – artless noise May 28 '14 at 14:38
  • See: [LDREX/STREX and cache](http://stackoverflow.com/questions/11383125/do-the-arm-instructions-ldrex-strex-have-to-operate-on-cache-aligned-data), [ERG question](http://stackoverflow.com/questions/10812442/arm-ll-sc-exclusive-access-by-register-width-or-cache-line-width), [Linux atomic_inc question](http://stackoverflow.com/questions/23734276/atomic-add-in-arm-atomic). Some concepts of `LDRX/STRX` are a little foreign. As per the *Linux atomic_inc* question, I think you are thinking *atomic* versus *lock-free*; see my down-voted wiki answer there. – artless noise May 28 '14 at 14:50
  • ldrex and strex are there to insure that in a multicore processor, your cores code had exclusive access to a memory location, basically nobody else interfered with that location between your executing the two separate transactions. Interrupt modes, etc, have nothing to do with it, either you need to use the pair of instructions or you dont. – old_timer May 28 '14 at 17:29
  • we have had this argument countless times, the ldrex/strex are optional for uniprocessor systems, and there are systems in the field where it is not supported. – old_timer May 28 '14 at 17:43
  • Yes it is optional. If the single core has it, it will work. For the OP, the [cortex-a9](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/CBBDIIFI.html) seems to include support. It is probably wise to read the vendor documentation and test for support; sometime the implementers remove things, even when ARM documents it as supported. – artless noise May 28 '14 at 17:54
  • 1
    if you remain within the arm logic, the caches it will work, if/when it touches the vendor logic then it may or may not and it may follow a different set of rules or interpretation of the rules. Generally the caches are on and the vendor logic never sees the exclusive accesses so it is all in arms domain. – old_timer May 28 '14 at 18:05

2 Answers2

0

LDREX/STREX are something that have to be implemented by the chip vendor, hopefully to arms specification. You can and should get the arm documentation on the topic, in this case in additional to arm arms and trms you should get the amba-axi documentation.

So if you have

ldrex thread 1
interrupt
ldrex thread 2
strex thread 2
return from interrupt
strex thread 1

Between the thread 2 ldrex and strex there has been no modification of that memory location, so the strex should work. But between the thread 1 strex and the prior ldrex there has been a modification to that location, the thread 2 strex. So in theory that means the thread 1 strex should fail and you have to try your thread 1 ldrex/strex pair again until it works. But that is exactly by design, you keep trying the ldrex/strex pair in a loop until it succeeds.

But this is all implementation defined so you have to look at the specific chip vendor and model and rev and do your own experiments. The bug in linux for example is that ldrex/strex is an infinite loop, apply it to a system/situation where ldrex/strex is not supported you get an OKAY instead of an EXOKAY, and the strex will fail forever you are stuck in that infinite loop forever (ever wonder how I know all of this, had to debug this problem at the logic level).

First off ARM documents that exclusive access support is not required for uniprocessor systems so the ldrex/strex pair CAN fail to work IF you touch vendor specific logic on single core systems. Uniprocessor or not if your ldrex/strex remains within the arm logic (L1 and optional L2 caches) then the ldrex/strex pair are goverened by ARM and not the chip vendor so you fall under one set of rules, if the pair touches system memory outside the arm core, then you fall under the vendors rules.

The big problem is that ARM's documentation is unusually incomplete on the topic. Depending on which manual and where in the manual you read it for example says if some OTHER master has modified that location which in your case it is the same master, so the location has been modified but since it was by you the second strex should succeed. Then the same document says that another exclusive read resets the monitor to a different address, well what if it is another exclusive read of the same address?

Basically yours is a question of what about two exclusive writes to the same address without an exclusive read in between, does/should the second succeed. A very good question...I cant see that there is a definitive answer either within all the arm cores or in the whole world of arm based chips.

The bottom line with ldrex/strex it is not completely ARM core specific but also chip specific (vendor). You need to do experiments to insure you can use that instruction pair on that system (uniprocessor or not). You need to know what the ARM core does (the caches) and what happens when that exclusive access goes out past the core to the vendor logic. Repeat for every core and vendor you care to port this code to.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 2
    Unusually, there are actually some directly incorrect statements in dwelch's answer above. I have read the same docs, and am not surprised they caused confusion. Cached memory is managed using the local monitor only, and does not rely on any global monitor being implemented. In a multi-core system, not implementing a global monitor for RAM is effectively in violation of the architecture specification (but has happened). The bits referred to as "implementation defined" in the ARM Architecture Reference Manual (with regards to monitors) would not affect the functionality of correct code. – unixsmurf May 28 '14 at 21:09
  • 1
    you need to read more docs then if you cant find what I was referencing. The ARM/TRM/AMBA-AXI docs condtradict themselves and are incomplete with respect to the posters question and specifically incomplete with respect to bugs found in linux and other assumptions made about these instructions. Software folks think they are a direct replacement for swp because arm implies that but that is not quite how they work. and at the end of the day it is implementation defined since it is an individual at some chip company that has to implement this. – old_timer May 29 '14 at 19:59
  • get a room full of engineers you get a room full of different solutions based on interpretations of the documentation. – old_timer May 29 '14 at 20:00
  • If you simply cache the area then you are at least within ARM's realm which still will vary from architecture to architecture but it should be more consistent than vendor to vendor. – old_timer May 29 '14 at 20:01
0

Apologies for just throwing in an "it's wrong" statement to dwelch, but I did not have time to write a proper answer yesterday. dwelch's answer to your question is correct - but pieces of it are at the very least possible to misinterpret.

The short answer is that, yes, you need to either disable interrupts for both threads or use ldrex/strex for both threads.

But to set one thing straight: support for ldrex/strex is mandatory in all ARM processors of v6 or later (with the exception of v6M microcontrollers). Support for SWP however, is optional for certain ARMv7 processors.

The behaviour of ldrex/strex is dependent on whether your MMU is enabled and what memory type and attributes the accessed region is configured with. Certain possible configurations will require additional support to be added to either the interconnect or RAM controllers in order for ldrex/strex to be able to operate correctly.

The entire concept is based around the idea of local and global exclusive monitors. If operating on memory regions marked as non-shareable (in a uniprocessor configuration), the processor needs only be concerned with the local monitor.

In multi-core configurations, coherent regions are managed using what is architecturally considered to be a global monitor, but still resides within the multi-core processor and does not rely on externally implemented logic.

Now, dwelch is correct in that there are way too many "implementation defined" options surrounding this. The sequence you describe is NOT architecturally guaranteed to work. The architecture does not require that an str transitions the local (or global) monitor from exclusive to open state (although in certain implementations, it might).

Hence, the architecturally safe options are:

  1. Use ldrex/strex in both contexts.
  2. Disable interrupts in both contexts.
unixsmurf
  • 5,852
  • 1
  • 33
  • 40