The real answer is: because x86's memory model is already strong enough that blocking compile-time reordering is sufficient for load or store ordering; runtime reordering is already blocked by hardware.
Those are just generic compile-time barriers made through a piece of inline assembly that, if used, prevents GCC from reordering instructions. It's explained pretty well in this other post. What can be achieved using this "trick" is usually also possible using the C volatile
qualifier.
Note that the Linux kernel does not use those specific macros anywhere in the code, those are just two macros defined for io_uring
userspace test tools. It definitely uses asm volatile ("" ::: "memory")
where needed, but under different names (e.g. smp_rmb()
, smp_wmb()
).
x86's memory model makes sfence
and lfence
entirely useless for communication between CPUs; blocking compile-time reordering is sufficient: see Does the Intel Memory Model make SFENCE and LFENCE redundant?
smp_mb()
is a full barrier and does need an actual asm instruction, as well as blocking compile-time reordering.
x86 does have some memory barrier asm instructions for read-only and write-only "real" (runtime) memory barriers. Those are sfence
(store fence), lfence
(load fence) and mfence
(memory fence = full barrier).
mfence
serializes both read and writes (full barrier) while the others only serialize one of the two (reads OR writes a.k.a loads OR stores). The wikipedia page on memory ordering does a decent job of explaining the meaning of those. lfence
actually blocks LoadStore reordering, not just LoadLoad, for weakly-ordered movntdqa
loads from WC memory. Reordering of other kinds of loads from other memory types are already disallowed so there's almost never any reason to actually use lfence
for memory ordering, instead of its other effect of blocking out-of-order exec.
The kernel uses those actual asm instructions for memory barriers in I/O code, for example mb()
, rmb()
and wmb()
which expand exactly to mfence
, lfence
, sfence
, and others (example).
sfence
and lfence
are probably overkill in most cases, for example around MMIO to strongly-ordered UC memory. Writing to WC memory could actually need an sfence. But they're not too slow compared to I/O, and there might be some cases that would be a problem otherwise, so Linux takes the safe approach.
In addition to this, x86 has different kind of read/write barriers which may also be faster (such as the one I linked above). See the following answers for more about full barriers (what C11 calls sequential consistency) with either mfence
or a dummy lock
ed instruction: