0

From what I understand of atomics, they are special assembly instructions which guarantee that two processors in an SMP system cannot write to the same memory region at the same time. For example in PowerPC an atomic increment would look something like:

retry:
  lwarx  r4, 0, r3 // Read integer from RAM location r3 into r4, placing reservation.
  addi   r4, r4, 1 // Add 1 to r4.
  stwcx. r4, 0, r3 // Attempt to store incremented value back to RAM.
  bne-   retry     // If the store failed (unlikely), retry.

However this does not protect the four instructions from begin preempted by an interrupt, and another task being scheduled in. To protect from preemption you need to do an interrupt-lock before entering the code.

From what I can see of C++ atomics, they seem to be enforcing locks if needed. So my first question is -

  1. Does the C++ standard guarantee no preemption will happen during an atomic operation? If so, where in the standard can I find this?

I checked the atomic<int>::is_always_lock_free on my Intel PC, and it came true. With my assumption of the above assembly block, this confused me. After digging into Intel assembly (which I am unfamiliar with) I found lock xadd DWORD PTR [rdx], eax to be happening. So my question is -

  1. Do some architectures provide atomic related instructions which guarantee no-preepmtion? Or is my understanding wrong?

Finally I was wondering about the compare_exchange_weak and compare_exchange_strong semantics -

  1. Does the difference lie int the retry mechanism or is it something else?

EDIT: After reading the answers, I am curious about one more thing

  1. The atomic member function operations fetch_add, operator++ etc. are they strong or weak?
tinkerbeast
  • 1,707
  • 1
  • 20
  • 42
  • 3
    Lock-free does not guarantee non-preemptible. The PowerPC example you cited is lock-free but is still pre-emptible. (1) C++ does not guarantee non-preemptible behavior. Some processors support non-premptible read-modify-write operations, but not all, such as the PowerPC, as you noted. (2) Yes some architectures do support non-preemptible read-modify-write atomics. (3) The difference is that weak is allowed to fail spuriously. This aligns with PowerPC semantics. – Raymond Chen Jun 21 '20 at 04:17
  • 1
    The only "weak" atomic is `compare_exchange_weak`. The others have no means to signal failure and have an unconditional effect. On LL/SC machines like PowerPC, they have to compile to a retry loop like CAS_strong. (Except of course pure load / pure store, which don't need need a retry loop to be strong because they're not RMW ops.) – Peter Cordes Jun 21 '20 at 07:00

2 Answers2

1

Does the C++ standard guarantee no preemption will happen during an atomic operation? If so, where in the standard can I find this?

No, it doesn't. Since there's really no way code could tell whether or not this happened (it's indistinguishable from pre-emption either before or after the atomic operation depending on circumstances) there is no reason for it to.

Do some architectures provide atomic related instructions which guarantee no-preemption? Or is my understanding wrong?

There would be no point since the operation must appear to be atomic anyway, so pre-emption during would always be identical in observed behavior to pre-emption before or after. If you can write code that ever sees a case where pre-emption during the atomic operation causes observable effects different from either pre-emption before or pre-emption after, that platform is broken since the operation does not behave atomically.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Pre-emption during a non-lock-free atomic op would be a big deal. (Blocking other threads if they need the same lock, until this thread gets CPU time again). So yes, in that sense some ISAs do provide lock-free atomics. On some ISAs (like x86, or ARMv8.1) there are single-instruction RMWs that can't be interrupted; an interrupt either happens or not, and won't lead to spurious LL/SC failure. But as you say, that distinction doesn't matter. Once atomics are lock-free, you're fine. A spurious LL/SC failure because of an interrupt isn't meaningfully different from other causes. – Peter Cordes Jun 21 '20 at 06:24
  • The issue is not whether it happens to be possible to pre-empt them but whether they're *guaranteed* not to be pre-empted. That's an odd guarantee for a platform to make considering you couldn't tell the difference anyway. – David Schwartz Jun 21 '20 at 18:33
  • 1
    I was trying to make that point that you *can* possibly tell the different in performance, if you use `atomic` such that `is_always_lock_free` is false. (vs. a hypothetical implementation that could never context-switch while holding one of the locks from the hash-table of locks). Your answer is only considering the lock-free case, which is fine because non-lock-free atomics suck and mostly only exist so C++ can define a portable standard, not so you can actually use them. – Peter Cordes Jun 21 '20 at 18:36
  • @PeterCordes Poor performance could be from anything. There's no way the program could tell it was from being pre-empted while holding a lock. And, in any event, every sane platform does whatever they can to get you the best performance on that platform. If performance is bad, it's going to be because the platform just can't do any better, not because of some particular implementation error or guarantee. (Unless the implementation of atomic itself is bad, and of course the solution to bad implementations of core functionality is to use better implementations.) – David Schwartz Jun 21 '20 at 18:56
  • 1
    You could design an experiment to maybe show the effect: have a bunch of threads all reading an atomic object. See how performance scales with number of threads vs. number of logical CPU cores: if there's a bigger than expected drop in throughput once not all threads can be running at all times, it's probably from some sleeps happening inside critical sections. (Of course you can't make absolute statements, I'm not disagreeing with that point). – Peter Cordes Jun 21 '20 at 19:14
  • But if the locks used hardware lock elision (e.g. on a Skylake CPU without the microcode update that disabled HLE), you could see smoother scaling with number of cores. (And read-side scalability from not *actually* serializing by taking the lock, so it massively helps the case where there are enough cores, too. Perhaps using pure stores could still force MESI ownership to serialize writes, so the only diff from HLE would be that it's mostly non-blocking most of the time, not getting into a state where you can sleep while holding a lock.) – Peter Cordes Jun 21 '20 at 19:17
1

This is similar to this question: Anything in std::atomic is wait-free?

Here are some definitions of lock-freedom and wait-freedom (both taken from Wikipedia):

An algorithm is lock-free if, when the program threads are run for a sufficiently long time, at least one of the threads makes progress.

An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.

Your code with the retry loop is lock-free: a thread only has to perform a retry if the store fails, but that implies that the value must have been updated in the meantime, so some other thread must have made progress.

With regard to lock-freedom it doesn't matter whether a thread can be preempted in the middle of an atomic operation or not.

Some operations can be translated to a single atomic operation, in which case this operation is wait-free and therefore cannot be preempted midway. However, which operations are actually wait-free depends on the compiler and the target architecture (as described in my answer in the referenced SO question).

Regarding the difference between compare_exchange_weak and compare_exchange_strong - the weak version can fail spuriously, i.e., it may fail even though the comparison is actually true. This can happen on architectures with LL/SC. Suppose we use compare_exchange_weak to update some variable with the expected value A. LL loads the value A from the variable, and before the SC is executed, the variable is changed to B and then back to A. So even though the variable contains the same value as before, the intermediate change to B causes the SC (and therefore the compare_exchange_weak) to fail. compare_exchange_strong cannot fail spuriously, but to achieve that it has to use a retry-loop on architectures with LL/SC.

I am not entirely sure what you mean by fetch_add being "strong or weak". fetch_add cannot fail - it simply performs an atomic update of some variable by adding the provided value, and returns the old value of the variable. Whether this can be translated to a single instruction (like on Intel) or to a retry loop with LL/SC (Power) or CAS (Sparc) depends on the target architecture. Either way, the variable is guaranteed to be updated correctly.

mpoeter
  • 2,574
  • 1
  • 5
  • 12
  • 1
    https://en.wikipedia.org/wiki/Load-link/store-conditional says some LL/SC implementations can spuriously fail on modification of *nearby* data in the same block (of some size). IDK if real-world PowerPC monitoring is always exact down to the cache line. But yes, something must have made progress, even if it's a single thread that's false-sharing some of its private data with our shared thing. – Peter Cordes Jun 22 '20 at 15:20
  • Recommended: https://preshing.com/20120612/an-introduction-to-lock-free-programming/ – vmemmap May 22 '23 at 07:16