8/16-bit atomics on 32/64-bit processors

Question

In C++11 and C11 it is possible to use 8- and 16-bit atomics. Are there any pitfalls of using them on actual modern 32- and 64-bit CPUs? Are they lock-free? Are they slower than native-size atomics? I'm interested in both what standard says about it and how it's actually implemented on common architectures.

All of that are implementation details. But generally cpus can do atomics on less than the register size just fine. I would expect std::atomic to be 32/64bit otherwise rather than using a lock to access a single char. — Goswin von Brederlow, Jun 11 '22 at 09:02
In a related vein, I've wondered if there are any important concerns regarding [false sharing](https://en.wikipedia.org/wiki/False_sharing) and atomics on common architectures. — Eljay, Jun 11 '22 at 13:23

Peter Cordes · Accepted Answer · 2022-06-11T09:15:32.403

There are no common pitfalls or any reason to expect any.

The standard say nothing about it, but basically nothing about performance guarantees in general. But in practice, if atomic<int> is lock-free, it's almost certain that atomic<int16_t> and atomic<int8_t> are also lock-free. I'd be surprised if there are any mainstream implementations where that's not true.

x86 hardware supports them directly, at the same speed as other operand-sizes. e.g. mov load/store, and for atomic RMWs, lock xadd byte [rdi], al exists in byte operand-size as well as word/dword/qword. Same for all other atomic RMW instructions, including xchg and cmpxchg.

Other ISAs may have minor slowdowns for narrow stores (and maybe also loads), like a cycle of extra latency for a pure-load or pure-store. This is pretty much negligible compared to inter-core latency, and pretty minor even when a cache line is already hot. See Are there any modern CPUs where a cached byte store is actually slower than a word store? (it's not unique to atomic operations.)

Most non-x86 ISAs also have byte and 16-bit versions of the same instructions they provide for atomic RMWs, like ARM ldrexb / strexb.

Of course for an atomic RMW, it's also safe to do an RMW of the containing word, and that can be done "naturally" with minimal extra work for a fetch_or or other bitwise boolean, or a CAS. But I think most widely used ISAs have direct support for byte and 16-bit operations, so don't need that trick.

8/16-bit atomics on 32/64-bit processors

1 Answers1