arm armv8
ARMv8 has an elaborate set of dc
and ic
instructions to manage the data and instruction cache respectively.
Their exact description is more complicated than just "flush L2 data cache" etc, because ARMv8 has an abstract cache model that's based instead on "points" where the caches might diverge from each other. Cache lines can be invalidated or written back ("cleaned") either by virtual address or by explicit level/set/way address. So you have instructions like dc civac
, Data Cache Clean and Invalidate by Virtual Address to Point of Coherency.
The full definition of the cache model takes about 30 pages in the ARMv8 Architecture Reference Manual (Section D7.4), and about another 60 pages to describe the actual cache maintenance instructions (C5.3).
Most of this is irrelevant to the application programmer, and indeed, many forms of the ic/dc
instructions are privileged and cannot be executed by an application anyway. The ARMv8 memory model guarantees that data caches are coherent across the "inner shareable domain", which includes all cores that might be running threads of your program, so no explicit cache management instructions are needed for sharing variables with other threads. (You still need memory barriers like ldar/stlr/dmb
when you need to ensure that loads and stores commit to the coherent cache in a particular order, as for acquire, release or sequential consistency.)
One aspect that can affect application programming is that, unlike on x86, the instruction and data caches are not unified nor coherent with each other. Therefore, when you write data that is later going to be executed as instructions, such as when loading a binary or JIT compiling, you do need to explicitly clean the relevant lines from the data cache to the "point of unification", then invalidate them from the instruction cache, and finally execute a synchronization barrier (isb
) to flush any instructions already prefetched from cache. See Synchronizing caches for JIT/self-modifying code on ARM for more details.
Memory-mapped I/O registers should be marked as "device memory" in the page tables set up by the kernel, which automatically exempts them from caching and reordering, so you do not need explicit cache flushes or barriers to access them; ordinary loads and stores are enough. Some systems might need explicit cache management for DMA.
C/C++ compilers do not emit any cache maintenance instructions (nor barriers) for volatile
reads and writes. All you get is the usual guarantee that a volatile
read/write results in the execution of exactly one load/store instruction. As mentioned above, this should be sufficient for memory-mapped I/O access, which is the main legitimate use for volatile
in C/C++. If you are doing something else for which cache maintenance is actually needed, then you have to insert those instructions yourself. For the JIT situation described above, gcc/clang provide __builtin_clear_caches()
.
Other languages like C#/Java have different semantics for volatile
, more like C _Atomic
or C++ std::atomic
. In this case you would get memory barriers but still no cache maintenance.