14

Does AArch64 support unaligned access natively? I am asking because currently ocamlopt assumes "no".

artless noise
  • 21,212
  • 6
  • 68
  • 105
Demi
  • 3,535
  • 5
  • 29
  • 45
  • 1
    "Yes, if strict alignment checking isn't enabled, except in all the cases where it doesn't (or does but you don't want the side-effects)" is probably the summary. Whether it's supported for precisely what you're trying to do depends on what that is, and since I know zero about OCaml, its ABI, data types, and what machine instructions it uses for what, even with the details I doubt the definitive answer will be coming from me :/ – Notlikethat Jul 22 '16 at 23:00
  • Load 16, 32, and 64 bit values from memory. No SIMD, etc. – Demi Jul 23 '16 at 00:17
  • For the particular case of stack SP accesses, it appears to be configurable with `SCTLR_ELx.SA` and `SCTLR_EL1.SA0` as mentioned briefly at: https://stackoverflow.com/questions/212466/what-is-a-bus-error/31877230#31877230 – Ciro Santilli Jun 09 '19 at 10:33

1 Answers1

16

Providing the hardware bit for strict alignment checking is not turned on (which, as on x86, no general-purpose OS is realistically going to do), AArch64 does permit unaligned data accesses to Normal (not Device) memory with the regular load/store instructions.

However, there are several reasons why a compiler would still want to maintain aligned data:

  • Atomicity of reads and writes: naturally-aligned loads and stores are guaranteed to be atomic, i.e. if one thread reads an aligned memory location simultaneously with another thread writing the same location, the read will only ever return the old value or the new value. That guarantee does not apply if the location is not aligned to the access size - in that case the read could return some unknown mixture of the two values. If the language has a concurrency model which relies on that not happening, it's probably not going to allow unaligned data.
  • Atomic read-modify-write operations: If the language has a concurrency model in which some or all data types can be updated (not just read or written) atomically, then for those operations the code generation will involve using the load-exclusive/store-exclusive instructions to build up atomic read-modify-write sequences, rather than plain loads/stores. The exclusive instructions will always fault if the address is not aligned to the access size.
  • Efficiency: On most cores, an unaligned access at best still takes at least 1 cycle longer than a properly-aligned one. In the worst case, a single unaligned access can cross a cache line boundary (which has additional overhead in itself), and generate two cache misses or even two consecutive page faults. Unless you're in an incredibly memory-constrained environment, or have no control over the data layout (e.g. pulling packets out of a network receive buffer), unaligned data is still best avoided.
  • Necessity: If the language has a suitable data model, i.e. no pointers, and any data from external sources is already marshalled into appropriate datatypes at a lower level, then there's really no need for unaligned accesses anyway, and it makes the compiler's life that much easier to simply ignore the idea altogether.

I have no idea what concerns OCaml in particular, but I certainly wouldn't be surprised if it were "all of the above".

Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • 1
    Actually, the answer is "none of the above": I am writing exactly a raw pointer module for OCaml. – Demi Jul 25 '16 at 08:22
  • From a quick test on godbolt it seems that gcc assumes that unaligned access on aarch64 is ok. – plugwash Sep 26 '20 at 21:42
  • @Notlikethat, would MMU help here, suppose working on MMU less OS would trigger unaligned access more ? – Milan Sep 17 '21 at 07:31
  • [Memory pool atomic C11 variables cause segfaults](//stackoverflow.com/q/74035029) hints that 64-bit `stlr` also faults on misalignment. (Which makes sense; the use-case is lock-free atomics, and without alignment it might only be ordered, not atomic.) But anyway, not *just* load/store-exclusive like `ldaxr` / `stlxr` for atomic RMWs require alignment, unless I'm drawing a wrong conclusion. Oh, maybe I am; even without optimization, clang compiles atomic_store_explicit with mo_relaxed into `str`, not `stlr`, but OP says that still faults on their M1, unless Apple clang's different – Peter Cordes Oct 12 '22 at 07:10
  • 1
    @PeterCordes: `ldar` and `stlr` are indeed documented to fault if misaligned. – Nate Eldredge Oct 12 '22 at 14:12
  • And in that linked Q&A, @Nate spotted that it was segfaulting in the version with the `relaxed` store because it also later loaded with the default `seq_cst`. So that resolves the mystery. – Peter Cordes Oct 12 '22 at 18:52