1

When working in a concurrent, parallel programming language with multiple threads working together on multiple cores and/or multiple sockets, what is the largest value in memory considered to be atomic?

What I mean is: a string, being a sequence of bytes, is decidedly not atomic because a write to that location in memory may take some time to update the string. Therefore, a lock of some kind must be acquired when reading and writing to the string so that other threads don't see corrupted, half-finished results. However, a string on the stack is atomic because AFAIK the stack is not a shared area of memory across threads.

Is the largest guaranteed, lockless unit a bit or a byte, or does it depend on the instruction used to write that byte? Is it possible, for instance, for a thread to read an integer while another thread is moving the value bit-by-bit from a register to the stack to shared memory, causing the reader thread to see a half-written value?

I guess I am asking what the largest atomic value is on x86_64 and what the guarantees are.

Naftuli Kay
  • 87,710
  • 93
  • 269
  • 411
  • smallest or largest? – bmargulies Oct 21 '17 at 21:55
  • Updated the question. – Naftuli Kay Oct 21 '17 at 21:57
  • AFAIK depends on the instruction. There are no guarantees that two processors are not touching the same memory, unless they are synchronized by the hw via atomic instructions. The smallest atomic unit is a byte in x86. – Rafael Oct 21 '17 at 21:58
  • 2
    If a 64-bit integer doesn't cross a cache line, then load or store is atomic. But it is not recommended to depend on this: use proper std::atomic features. – geza Oct 21 '17 at 21:59
  • 1
    You can share stack memory across threads. Pass the address of some stack memory to another thread and have the other thread dereference the pointer. – Raymond Chen Oct 21 '17 at 22:16
  • @geza: That's only guaranteed on Intel CPUs, not AMD. The common subset is that [a cached load or store that doesn't cross an 8B boundary is atomic.](https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic). Multi-socket K10 really does tear at 8B boundaries. It also happens that aligned 8B loads/stores are guaranteed atomic, even uncached, so for the largest size it's just aligned 8 bytes. (Not counting the slow `lock cmpxchg16b` hack that gcc/clang use for `atomic` up to 16 bytes. gcc7 decided to stop calling that lock-free) – Peter Cordes Oct 22 '17 at 01:03
  • @PeterCordes: thanks Peter, that's good to know! I always learn something from you :) – geza Oct 22 '17 at 09:57

1 Answers1

4

The largest atomic instruction in x86-64 is lock cmpxchg16b, which reads and writes 16 bytes atomically.

Although it is usually used to atomically update a 16-byte object in memory, it can also be used to atomically read such a value.

To atomically update a value, load rdx:rax with the prior value and rcx:rbx with the new value. The instruction atomically updates the memory location with the new value only if the prior value hasn't changed.

To atomically read a 16-byte value, load rdx:rax and rcx:rbx with the same value. (It doesn't matter what value, but 0 is a good choice.) The instruction atomically reads the current value into rdx:rax.

prl
  • 11,716
  • 2
  • 13
  • 31
  • Fun fact: gcc and clang use this trick for `std::atomic` for objects from 9 to 16 bytes. It's ugly, though, and especially for loads much less efficient than a `mov` (but generally still better than taking a separate lock). Still, [gcc7 only uses `lock cmpxchg16b` in the library function instead of inlining even with `-mcx16`, and `.is_lock_free()` returns false](https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02344.html). In MSVC, `atomic` is only lock-free up to 8 bytes. – Peter Cordes Oct 22 '17 at 02:40
  • related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80835 and [How can I implement ABA counter with c++11 CAS?](https://stackoverflow.com/questions/38984153/how-can-i-implement-aba-counter-with-c11-cas). – Peter Cordes Oct 22 '17 at 02:41