For a 64-byte cache line (like alignas(64) char two_lines[128]
), a 4-byte access at any offset within the line from 0 .. 60 is guaranteed atomic (if the memory region is cacheable), but 61, 62, and 63 aren't because the access will be split across two cache lines. i.e. any normal-sized access that isn't a cache-line split is atomic on cacheable memory.
A page or region of memory can be marked (with PAT or MTTR) as write-back cacheable (WB, used for all normal ways of allocating memory in a normal program), write-through cacheable (WT), uncacheable-write-combining (WC, often used for video RAM), or fully uncacheable (UC, strong memory-ordering rules often used for MMIO.) Or at least one other type, WP (write protect) which is also cacheable.
This sentence only applies to cacheable memory: WB and WT, not UC or WC. The underlying mechanism is that for cacheable memory, the load/store is just reading or writing the cache line in the CPU cache. It will later get transferred to DRAM as a whole 64-byte unit. (Unlike on AMD where cache-line transfers between cores aren't necessarily atomic; that can be a source of tearing at some boundaries.) But for uncacheable, the individual write may get sent over a bus that can only do aligned 8-byte chunks or something.
C doesn't provide a portable way to do unaligned accesses, other than memcpy
. If you had a GNU C typedef uint32_t unaligned_dword __attribute__((aligned(1),may_alias));
you could say *(unaligned_dword*)&two_lines[63]
would access a 4-byte chunk that spans two cache lines, and thus not be atomic.
But *(volatile unaligned_dword*)&line[11]
would be an atomic access, if the compiler did it with a single asm instruction, which is also not guaranteed in C in general (but GCC and clang will do so if they can with volatile
). You need _Atomic
if you really want the compiler to make asm with atomic accesses. Intel's manual is documenting asm guarantees, not C.
This rule isn't usually relevant for programming in C since 2, 4, and 8-byte types have alignof(T) = sizeof(T)
in standard x86-64 ABIs, so it's not valid to do misaligned accesses to them without extensions like GNU C attributes. In the i386 System V ABI, alignof(double)
is only 4
, so you can have a misaligned double or int64_t, but there's no way to declare one that can be misaligned within a cache line but not split across cache lines. (Except maybe part of an alignas(64) struct
.)
Related: Why is integer assignment on a naturally aligned variable atomic on x86? discusses this sentence and how it's a stronger guarantee than what AMD guarantees. (Only unaligned accesses within an 8-byte qword are guaranteed atomic, even on cacheable memory.)