Can DMB instructions be safely omitted in ARM Cortex M4

Question

I am going through the assembly generated by GCC for an ARM Cortex M4, and noticed that atomic_compare_exchange_weak gets two DMB instructions inserted around the condition (compiled with GCC 4.9 using -std=gnu11 -O2):

// if (atomic_compare_exchange_weak(&address, &x, y))
dmb      sy
ldrex    r0, [r3]
cmp      r0, r2
itt      eq
strexeq  lr, r1, [r3]
cmpeq.w  lr, #0
dmb      sy
bne.n    ...

Since the programming guide to barrier instructions for ARM Cortex M4 states that:

Omitting the DMB or DSB instruction in the examples in Figure 41 and Figure 42 would not cause any error because the Cortex-M processors:

do not re-order memory transfers

do not permit two write transfers to be overlapped.

Is there any reason why these instructions couldn't be removed when targetting Cortex M?

Did you target exactly that processor with the appropriate`-march` option? — Jens Gustedt, Jun 11 '18 at 14:54
@JensGustedt yes, everything is set up and works correctly, the project is 1 year old, we are just changing the way some parts work so this is the first time I've checked the assembly for that part. — vgru, Jun 11 '18 at 17:18

R.. GitHub STOP HELPING ICE · Accepted Answer · 2018-06-11T15:06:20.880

0

I'm not aware of whether Cortex M4 can be used in a multi-cpu/multi-core configuration, but in general:

Memory barriers are never necessary (can always be omitted) in single-core systems.
Memory barriers are always necessary (can never be omitted) in multi-core systems where threads/processes operating on the same memory may be running on different cores.

Presence or lack of reordering memory writes at the hardware level is irrelevant.

Of course I would expect the DMB instruction to be essentially free on chips that don't support SMP, so I'm not sure why you'd want to try to hack it out.

Please note that, based on the question's referencing the code the compiler produces for atomic intrinsics, I'm assuming the context is for synchronization of atomics to make them match the high-level specification, not other uses like IO barriers for MMIO, and the above "never" should not be read as applying to this (unrelated) use (though I suspect, for the reasons you already cited, it doesn't apply to Cortex M4).

edited Jun 11 '18 at 15:06

answered Jun 11 '18 at 14:55

R.. GitHub STOP HELPING ICE

208,859
35
376
711

4

I think memory barriers are needed in single core systems in some cases. For example if we have a peripheral with memory-mapped register and we want to ensure the write to that register was complete (acknowledged on the bus) before performing other operations. – Eugene Sh. Jun 11 '18 at 14:58
@EugeneSh.: OK, I should probably clarify "never". The context I'm assuming is for synchronization of atomics to make them match the high-level specification, not other uses like IO barriers for MMIO. – R.. GitHub STOP HELPING ICE Jun 11 '18 at 15:04
@R.. Thanks, this is what I would answer myself if someone asked me, so I just wanted to double check. It's not that it makes a huge difference, each DMB takes 1 cycle on M4 from my understanding. – vgru Jun 11 '18 at 17:23
One cycle does not sound worth fighting with the compiler/rolling your own atomics, or like it would even be measurable in the program as a whole (as opposed to microbenchmarks). – R.. GitHub STOP HELPING ICE Jun 11 '18 at 20:40
@EugeneSh. That's covered by `volatile` accessing all such registers. The C standard says "Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine." So contrary to popular belief, the compiler is not allowed to re-order access to volatile objects. Doing so would render the system non-compliant with the C language. So the compiler has to insert memory barriers to guarantee this, if the hardware can't guarantee it. While `volatile` access does not necessarily give a memory barrier, it does gives a guarantee of sequencing. – Lundin Jun 12 '18 at 06:28
1

@R.. As an aside, the LPC4357 features a cortex m4 and cortex m0, sharing the same memory space. – Colin Jun 12 '18 at 08:30
@Lundin: to be slightly more precise, compiler is not allowed to re-order accesses to `volatile` objects with accesses to other `volatile` objects. It can easily change `a=b; a++; b=1;` into `a=b; b=1; a++;` if only `b` is `volatile`, in which case it's worth weighing whether it makes sense to make them both `volatile`, or insert a compiler barrier. – vgru Jun 12 '18 at 09:39
@Groo No. "In the abstract machine, all expressions are evaluated as specified by the semantics." `a++;` in your example is clearly sequenced before `b=1;` and `b=1;` is a side-effect. `a++` is not allowed to be sequenced after `b=1;`. Compiler vendors/CPU manufacturers violate this for the sake of speed and then lazily try to put the responsibility on the programmers by demanding that they add "memory barriers" in their code. But it has never been the programmer's responsibility. There's no such thing as memory barriers in C, only "sequenced before/after". – Lundin Jun 12 '18 at 09:51
@Lundin `volatile` is not a machine instruction. We are talking about the machine instructions DMB/DSB here, which might be used to implement the `volatile` behavior, so I don't see a contradiction. – Eugene Sh. Jun 12 '18 at 14:37
@Lundin: well, as of version 8.1, GCC [still violates the rule you are mentioning](https://godbolt.org/g/e7MjNB). So, what you wrote to Eugene is in practice incorrect. I'd hate to have to explain to my customers that the compiler is broken, but their code works in theory. Also, the question is not about the C standard, it's about a specific microcontroller and its specific memory barrier instructions, which also comes with a specific toolset, which contains specific compiler barrier instructions. – vgru Jun 13 '18 at 08:33
@Groo It's because gcc devs work like this: 1) optimize the program as much as possible, come hell or high water 2) try to find support in the C standard for the optimizations, either by influencing the committee or by finding some loophole or ambiguous text to abuse 3) in case of abusing on loopholes/ ambiguous text, or in case of simply failing to conform to the standard, try to put the burden on the programmer. Two perfect examples of this scenario are memory barriers and strict aliasing, where gcc simply breaks code. – Lundin Jun 13 '18 at 11:14
@Lundin: Your explanation is completely counterfactual. Optimization based on the fact that objects of different types cannot alias except under certain very specific conditions was intended ever since the original ANSI C standardization process. That's why the rules about effective type and compatible types were written into the original standard. The compiler devs did not "go looking for" this. It was always the intent; compiler tech was just too bad to make use of it for a few decades. – R.. GitHub STOP HELPING ICE Jun 13 '18 at 15:01
1

@R.. No, the intent of effective type is well explained in the C rationale. It was never the intention that for example int* and double* could alias. The point where this started to be abused was where things like uint16_t* couldn't alias a uint32_t* etc, effectively making all manner of hardware-related programming with gcc specifically a safety hazard. As a result, we see embedded systems written in gcc go havoc every day, because the average C programmer doesn't even know about strict aliasing and effective type. It has been like this since Cortex M became mainstream. – Lundin Jun 13 '18 at 15:07
@Lundin: The problem, fundamentally, is with the phrase "by an lvalue of one of the following types". If it had said "by an lvalue or other means *that is visibly associated with* one of the following types", and made clear that quality compilers shouldn't be willfully blind to evidence of such association, that would have avoided a lot of problems, but there's no way the authors of clang and gcc would agree to such language today. – supercat Oct 08 '19 at 20:38
@supercat Not quite sure how you'd define "visibly associated". A quality compiler would allow type punning between integer types of different sizes, and type punning between byte types and any other type for the purpose of serialization and de-serialization. One of the biggest flaws of the strict aliasing rule imo is that it doesn't allow de-serialization from a character array, without resorting to various ugly tricks to dodge the rule. – Lundin Oct 09 '19 at 06:48
@Lundin: For the vast majority of "controversial" situations, a "don't be obtusely blind" rule would suffice, which is IMHO why C89 didn't go into more detail. C99 couldn't say that without impugning what some compilers were already doing. A compiler's need to recognize things would depend upon how aggressively it would try to exploit a lack of association, but if one accepts the idea that compilers should at least *try* to notice associations, the behavior of gcc and clang would be totally indefensible. – supercat Oct 09 '19 at 14:47
@Lundin: If I were to try to write a more specific rule, I would say that a compiler may assume that a compiler may consolidate accesses to an object, hoist them to the start of a function or loop, or defer them to the end of the loop, unless that would cross an action which makes use of that lvalue, a pointer of that type which could identify the same object or a member of the same array, or an lvalue from which the other one may have been derived earlier within that same function. I'd also clarify, for this purpose as well as "restrict", the concepts of... – supercat Oct 09 '19 at 14:51
...pointers "leaking" or being formed from unknown provenance. A pointer from unknown provenance need not be presumed capable of accessing objects of another type or guarded by `restrict` *unless* pointers that could identify those objects have been leaked earlier in the execution of the same function, but compilers should be very cautious about objects whose addresses have visibly leaked. – supercat Oct 09 '19 at 14:54

Can DMB instructions be safely omitted in ARM Cortex M4

1 Answers1

Linked