how to avoid unaligned access exceptions with float on cortex M4

Question

I am experiencing a HardFault exception in some code which computes a float expression with an integer operand. The operand is passed by address and it is then converted (either implicitly or with an explicit cast) to float. When the operand is not 32-bits aligned (which is not under my control), I get the exception.

I tried to reproduce the behavior on godbolt here and the generated code is consistent with what I get on my devices.

Basically, the following disassembly code

vldr.32 s0, [r0]  @ int

directly uses the possibly unaligned address passed to the function in the vldr instruction which requires aligned addresses.

I found this question which addresses a similar problem, but there they talk about float pointers. In that case I understand that floats can't be unaligned.

In my case I am dealing with integers, which are allowed to be unaligned and nonetheless the compiler assumes it can still use the address in the vldr instruction. What puzzles me even more is that this code

uint32_t pippo =  *(uint32_t *)src;

float pippof = pippo * 10.0f;

may or may not generate the exception when provided with unaligned address, depending on the optimization level, because with -O0 for instance an integer is allocated on the stack.

So my questions are:

is this correct behavior for the compiler (or the backend, maybe)? As the integers can be unaligned I would expect that the generated code passes from a CPU register.
what is a correct strategy to avoid such an issue, when even passing through a temporary int variable is not safe?

if src is defined as something other than a 32 bit variable type then of course this can fault, not a surprise there, that is not the proper/safe way to take for example an array of bytes and turn them into a word. — old_timer, Sep 10 '20 at 16:44
if you ask the compiler to do something that can/will generate a fault on a particular target, then yes, it can/will generate a fault, not a compiler problem. — old_timer, Sep 10 '20 at 16:45

Peter Cordes · Accepted Answer · 2020-09-11T15:30:36.567

C is not portable assembly language, it has its own rules

When the operand is not 32-bits aligned (which is not under my control)

alignof(uint32_t) is 4 so the compiler is allowed to assume 4-byte alignment. Dereferencing a uint32_t* that's not 4-byte aligned is C undefined behaviour so yes, the compiler is 100% allowed to assume that doesn't happen.

In your case specifically, *(uint32_t *)src is undefined behaviour if src isn't aligned. That's why code generated for later use of that data is allowed to assume it is aligned. The fact that ARM assembly happens to be ok with unaligned integer loads has nothing to do with anything, except for why it happens to work with optimization disabled. See https://trust-in-soft.com/blog/2020/04/06/gcc-always-assumes-aligned-pointers/ and Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for more examples and discussion about target-ISA misalignment behaviour / guarantees not making it safe in C.

If you have data with less alignment than that, you need some way to do a safe unaligned load. One ISO-C standard way is with memcpy. (GCC will reliably inline it on targets where it knows how to do an unaligned integer load, like ARM with a new enough -march= or -mcpu=. Unless you've used -fno-builtin-memcpy or similar, in which case this would be a bad choice with too much overhead.)

Another way is a GNU C typedef like
typedef uint32_t unaligned_u32 __attribute__((aligned(1))) and use unaligned_u32*.

That lets the compiler know it's not a plain ABI-compliant uint32_t object, and will have to emit code that loads in a way that will work even without alignment. This might be very inefficient; I didn't check GCC's asm output.

(You can use this GNU C type attribute for any type, including float if you want an unaligned_float.)

__attribute__((may_alias, aligned(1))) can be useful if you also want an aliasing-safe uint32_t. (Many embedded builds compile with -fno-strict-aliasing, where every type is implicitly may_alias, but if you use this rigorously everywhere it's actually needed you can make your code strict-aliasing safe.)

Using `memcpy` seems to do just the right efficient thing: it loads into an integer register with `ldr`, for which unaligned access is okay, and then moves it into the appropriate floating-point register. See https://godbolt.org/z/Wexs8v. For OP: modern compilers will generally optimize constant-size memcpy just like an assignment without alignment assumptions, so even though it looks on the surface like it's a possibly expensive function call, it often really isn't. — Nate Eldredge, Sep 10 '20 at 19:15
@NateEldredge: Thanks. On some ARM CPUs, vector -> integer regster can be slow. I don't know if integer -> vector can also stall the integer or vector pipeline on some in-order CPUs where they're only loosely coupled, or if that's usually fine. — Peter Cordes, Sep 10 '20 at 19:26
@old_timer: My answer is assuming that GCC will *inline* `memcpy` using something that's efficient on the known target ISA for a single possibly-misaligned word. You don't actually want to let the compiler emit a `bl memcpy` into an aligned memory destination! That would be garbage. If you're compiling with `-ffreestanding` or `-fno-builtin-memcpy` or something, definitely use the typedef with `__attribute__((aligned(1)))` to let GCC inline code. — Peter Cordes, Sep 10 '20 at 20:40
@old_timer: If you were repyling to Nate, please use @ Nate to ping him, and make it clear to me who you're replying to. I updated my answer anyway to make sure my meaning was clear; this is getting significant upvotes so I want to state some more of the assumptions and understanding my answer is based on. — Peter Cordes, Sep 10 '20 at 20:59
@PeterCordes thanks, very complete answer. As you pointed out, I was missing that I was trusting a platform-dependent feature and that GCC is not guaranteeing that. — sysopch, Sep 11 '20 at 12:26

how to avoid unaligned access exceptions with float on cortex M4

1 Answers1

C is not portable assembly language, it has its own rules