Intel documentation, atomic access description doesn't make sense

Question

I want to know the meaning of this sentence, parsing.

Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

I can't actually understand it, I'd like to get some simple C code for some explanation, thank you very much.

This is only half of a sentence. Can you be specific about where it came from (name of document, section and page number, etc) and give additional context? — Nate Eldredge, Aug 13 '23 at 02:24

score 1 · Answer 1 · answered Aug 12 '23 at 20:28

For a 64-byte cache line (like alignas(64) char two_lines[128]), a 4-byte access at any offset within the line from 0 .. 60 is guaranteed atomic (if the memory region is cacheable), but 61, 62, and 63 aren't because the access will be split across two cache lines. i.e. any normal-sized access that isn't a cache-line split is atomic on cacheable memory.

A page or region of memory can be marked (with PAT or MTTR) as write-back cacheable (WB, used for all normal ways of allocating memory in a normal program), write-through cacheable (WT), uncacheable-write-combining (WC, often used for video RAM), or fully uncacheable (UC, strong memory-ordering rules often used for MMIO.) Or at least one other type, WP (write protect) which is also cacheable.

This sentence only applies to cacheable memory: WB and WT, not UC or WC. The underlying mechanism is that for cacheable memory, the load/store is just reading or writing the cache line in the CPU cache. It will later get transferred to DRAM as a whole 64-byte unit. (Unlike on AMD where cache-line transfers between cores aren't necessarily atomic; that can be a source of tearing at some boundaries.) But for uncacheable, the individual write may get sent over a bus that can only do aligned 8-byte chunks or something.

C doesn't provide a portable way to do unaligned accesses, other than memcpy. If you had a GNU C typedef uint32_t unaligned_dword __attribute__((aligned(1),may_alias)); you could say *(unaligned_dword*)&two_lines[63] would access a 4-byte chunk that spans two cache lines, and thus not be atomic.

But *(volatile unaligned_dword*)&line[11] would be an atomic access, if the compiler did it with a single asm instruction, which is also not guaranteed in C in general (but GCC and clang will do so if they can with volatile). You need _Atomic if you really want the compiler to make asm with atomic accesses. Intel's manual is documenting asm guarantees, not C.

This rule isn't usually relevant for programming in C since 2, 4, and 8-byte types have alignof(T) = sizeof(T) in standard x86-64 ABIs, so it's not valid to do misaligned accesses to them without extensions like GNU C attributes. In the i386 System V ABI, alignof(double) is only 4, so you can have a misaligned double or int64_t, but there's no way to declare one that can be misaligned within a cache line but not split across cache lines. (Except maybe part of an alignas(64) struct.)

Related: Why is integer assignment on a naturally aligned variable atomic on x86? discusses this sentence and how it's a stronger guarantee than what AMD guarantees. (Only unaligned accesses within an 8-byte qword are guaranteed atomic, even on cacheable memory.)

Intel documentation, atomic access description doesn't make sense

1 Answers1