I think you are over complicating it. Look at the amba/axi spec (and also where did you find a multi-core cortex-m4?). ldrex/strex are for sharing a resource across processors in a multi-processor chip. They have been incorrectly used for other things for some time now. ARM unfortunately did an unusually bad job of documenting all of this correctly.
The exclusive part of the ldr is that the processorid and the address (range) are saved in a table. When an strex happens the processorid for that address (range) is checked if it matches EXOKAY and do the store if not OKAY and dont. Strex does not clear anything, they interestingly have this clrex instruction which I assumes sets the processorid to some value that wont hit or depending on how they build their tables they free up a table entry.
I may try this after writing this but you can just as easily ldrex then strex then strex, fairly certain I have done int on full sized arms, will try it on a cortex-m4 ldrex, strex, strex, clrex, strex and see what happens.
In a uniprocessor system, ldrex/strex are expected to work in ARM's logic but the chip vendor is not required to support it and may simply return OKAY (instead of EXOKAY). The L1 certainly and probably L2 are arm logic beyond that you get into chip vendor. (do cortex-ms have an l2?). Normally you are not going to have to worry about hitting the chip vendor code, you can run a long time if not indefinitely without knowing any of this as you will remain in one of the caches. And disabling both caches in Linux for example is a royal PITA, they may make it seem like it is a compile time option, but dig in and see the reality. And with only one processor how do you get a different processor id?
In multi-processor chips, the chip vendor is supposed to support it correctly beyond the caches if you can even get there with an exclusive access, how ldrex/strex are used normally, you are most likely to be within your L1 cache and never get exposed to what the chip vendor has provided, but it can happen if you get interrupted in between and you are likely saved by the L2. And in this case having more than one processorid in the chip makes sense, as there is more than one processor.
This is nice
The Cortex-M4 processor implements a local exclusive monitor. The
local monitor within the processor has been constructed so that it
does not hold any physical address, but instead treats any access as
matching the address of the previous LDREX. This means that the
implemented exclusives reservation granule is the entire memory
address range.
The m7 trm says the same thing.
Not having multiple cores how could/would one generate a different ID?
The docs are using the term processorid to indicate which processor is being used. How many processors are in a cortex-m? Perhaps it is documented elsewhere using a different string/name, but at this time I dont know how the processorid in a cortex-m is generated and being a uniprocessor is there more than one? I dont have access to a core to know for sure.
So even though the logic does not support a per-address exclusive access, they didnt say they didnt check the processorid, they simply consider all strex access for memory marked as shared to be checked against the processorid of the last ldrex independent of its address.
EDIT
PUT32(0x01000600,0x600);
PUT32(0x01000700,0x700);
PUT32(0x01000800,0x800);
CLREX();
hexstring(STREX(0x20000600,0x12345678));
hexstring(STREX(0x20000700,0x12345678));
hexstring(STREX(0x20000800,0x12345678));
hexstring(LDREX(0x20000600));
hexstring(STREX(0x20000600,0x6666));
hexstring(STREX(0x20000700,0x12345678));
hexstring(STREX(0x20000800,0x12345678));
hexstring(LDREX(0x20000600));
hexstring(STREX(0x20000700,0x7777));
hexstring(STREX(0x20000800,0x12345678));
hexstring(GET32(0x20000600));
hexstring(GET32(0x20000700));
hexstring(GET32(0x20000800));
CLREX();
hexstring(0xAABBCCDD);
hexstring(LDREX(0x20000600));
CLREX();
hexstring(STREX(0x20000600,0x2222));
hexstring(GET32(0x20000600));
producing
00000001
00000001
00000001
00000600 <-- ldrex
00000000 <-- strex pass
00000001 <-- strex fail
00000001
00006666
00000000
00000001
00006666
00007777
00000800
AABBCCDD
00006666
00000001
00006666
So looks like what they did here is the next strex after an ldrex passes independent of address. So using your terms the strex "clears the lock".
And note that putting a clrex between the ldrex and strex does make the strex fail.
Not hitting the same address doesnt matter one ldrex to one strex
hexstring(LDREX(0x20000900));
hexstring(STREX(0x20000900,0x2222));
hexstring(STREX(0x20000900,0x2222));
3EEDCC1B
00000000
00000001
Turning the data cache on didnt change the results.
Test functions:
.thumb_func
.globl LDREX
LDREX:
ldrex r0,[r0]
bx lr
.thumb_func
.globl CLREX
CLREX:
clrex
bx lr
.thumb_func
.globl STREX
STREX:
strex r0,r1,[r0]
bx lr
Unlike the big brother ARMs:
CLREX();
hexstring(STREX(0x20000600,0x12345678));
hexstring(LDREX(0x20000600));
hexstring(STREX(0x20000600,0x6666));
hexstring(LDREX(0x20000600));
PUT32(0x20000600,0x11);
hexstring(STREX(0x20000600,0x6666));
00000001
00000600
00000000
00006666
00000000
The strex survives the non exclusive access in between, at least based on the document you posted a non-exclusive store should spoil the prior ldrex (on an armv7-a).
Note the above is on a cortex-m4 r0p1 CPUID 0x410FC241