2

In C++11 and C11 it is possible to use 8- and 16-bit atomics. Are there any pitfalls of using them on actual modern 32- and 64-bit CPUs? Are they lock-free? Are they slower than native-size atomics? I'm interested in both what standard says about it and how it's actually implemented on common architectures.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
gavv
  • 4,649
  • 1
  • 23
  • 40
  • All of that are implementation details. But generally cpus can do atomics on less than the register size just fine. I would expect std::atomic to be 32/64bit otherwise rather than using a lock to access a single char. – Goswin von Brederlow Jun 11 '22 at 09:02
  • In a related vein, I've wondered if there are any important concerns regarding [false sharing](https://en.wikipedia.org/wiki/False_sharing) and atomics on common architectures. – Eljay Jun 11 '22 at 13:23

1 Answers1

3

There are no common pitfalls or any reason to expect any.

The standard say nothing about it, but basically nothing about performance guarantees in general. But in practice, if atomic<int> is lock-free, it's almost certain that atomic<int16_t> and atomic<int8_t> are also lock-free. I'd be surprised if there are any mainstream implementations where that's not true.

x86 hardware supports them directly, at the same speed as other operand-sizes. e.g. mov load/store, and for atomic RMWs, lock xadd byte [rdi], al exists in byte operand-size as well as word/dword/qword. Same for all other atomic RMW instructions, including xchg and cmpxchg.

Other ISAs may have minor slowdowns for narrow stores (and maybe also loads), like a cycle of extra latency for a pure-load or pure-store. This is pretty much negligible compared to inter-core latency, and pretty minor even when a cache line is already hot. See Are there any modern CPUs where a cached byte store is actually slower than a word store? (it's not unique to atomic operations.)

Most non-x86 ISAs also have byte and 16-bit versions of the same instructions they provide for atomic RMWs, like ARM ldrexb / strexb.

Of course for an atomic RMW, it's also safe to do an RMW of the containing word, and that can be done "naturally" with minimal extra work for a fetch_or or other bitwise boolean, or a CAS. But I think most widely used ISAs have direct support for byte and 16-bit operations, so don't need that trick.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847