3

As we know, compare_exchange_weak() returns an error (false value) if there's a race condition, so the operation can not be fully done. But how exactly race condition is determined by compare_exchange_weak()?

Does lock cmpxchg instruction returns an error if more that one thread tries to read/write value, i.e. acquire a lock and exactly this way compare_exchange_weak determines race condition?

JFMR
  • 23,265
  • 4
  • 52
  • 76
TwITe
  • 400
  • 2
  • 14
  • Are you aware of this Q/A [SO: Understanding std::atomic::compare_exchange_weak() in C++11](https://stackoverflow.com/q/25199838/7478597)? – Scheff's Cat Oct 20 '18 at 16:34
  • @Scheff Thanks, I have read this Q/A and have found a lot of useful info, but I still didn't find an answer. I just want to know, how exactly is race condition detected by compare_exchange_ function – TwITe Oct 20 '18 at 17:08
  • I've googled "compare_exchange_weak() lock cmpxchg". I'm under the impression that concerning Intel x86/x64, all these subtle details of locking, considered in C++11, are not so relevant as Intel commands are not that diverse. I couldn't find something which I would link as "proof" but you may search yourself to "get a feeling". – Scheff's Cat Oct 20 '18 at 17:13
  • 1
    I'm pretty sure that on x86 `compare_exchange_weak` does the same as `compare_exchange_strong`. `cmpxchg` instruction sets ZF (zero flag), depending on whether value was equal and consequently was changed. – zch Oct 20 '18 at 17:40

2 Answers2

5

The cmpxchg instruction affects the ZF flag: it is set if the exchange succeeded and it is cleared otherwise.

Let's see this with an example:

std::atomic<int> a;

bool my_compare_exchange(int expected, int desired) {
   bool succeeded = a.compare_exchange_weak(expected, desired);
   return succeeded;
}

The function my_compare_exchange() is translated into the following assembly code:

my_compare_exchange:
        mov     eax, edi
        lock cmpxchg    DWORD PTR a[rip], esi
        sete    al // <-- conditional instruction
        ret

The register al is set to 1 using sete al if the exchange succeeded (i.e., ZF was set by cmpxchg). Otherwise, it is set to zero (i.e., ZF was cleared by cmpxchg).

JFMR
  • 23,265
  • 4
  • 52
  • 76
  • thanks, now I got how compare_exchange detects race condition. But i also have 2 new questions :) – TwITe Oct 21 '18 at 09:01
  • First, ```lock``` prefix does next instruction being executed atomically, so the ```cmpxchg``` will execute correctly, but there's also ```sete al``` instruction, why it is not being executed atomically? What if tho threads tries to set value to ```al``` register – TwITe Oct 21 '18 at 09:04
  • No, `al` does not interfere, since each thread has its own `al` register (`al` is not shared between threads). The same applies for the stack. – JFMR Oct 21 '18 at 09:05
  • The only atomic instruction is the `lock cmpxchg`. – JFMR Oct 21 '18 at 09:06
  • And the second question: i tried to disassembly your code and see this instruction: ```test al, al```. Why we need this instruction? Is it not enough to just set ```al``` register? – TwITe Oct 21 '18 at 09:08
  • `test al, al` is probably for checking whether `al` is zero (Is it followed by `jz` / `jnz` ?). Try to enable the optimizations with `-O2`. – JFMR Oct 21 '18 at 09:10
  • 1
    Oops, i turned it on now and see exactly the same assembly code as yours, thanks! I accept your answer – TwITe Oct 21 '18 at 09:13
2

lock cmpxchg is atomic; nothing can happen during its execution (logically anyway). Unlike an LL/SC implementation of compare_exchange_weak, it can't spuriously fail when another thread is writing inside the same cache line, only if the compare actually fails. (Can CAS fail for all threads?)

compare_exchange_strong can be implemented as lock cmpxchg without requiring a loop.


Are you asking about the use-case where you load the old value, modify it, and then try to CAS a new value into place? (e.g. to synthesize an atomic operation that the hardware doesn't provide directly, e.g. atomic FP add, or atomic clear-lowest-bit).

In that case yes, the reason for cmpxchg failure would be a race with another thread.

As @ネロク explained, you check the flag result of cmpxchg to get the boolean result of compare_exchange_strong, i.e. to tell whether the compare part failed and whether the store was done or not. See Intel's manual entry for cmpxchg.


There are other use-cases for lock cmpxchg too, like waiting for a spinlock to become available by spinning on lock.compare_exchange_weak(0, 1), repeatedly trying to CAS an unlocked variable into a locked state. So a failed CAS isn't from a race condition, it's simply from the lock still being held. (Because you're passing a constant as the "expected", not what you just read from that location.)

This is generally not a good idea, AFAIK. I think it's normally better to spin read-only waiting for a lock, and only try to take the lock with a CAS or an XCHG if you saw it was available. (Using xchg and testing the integer result is potentially cheaper than lock cmpxchg, possibly keeping the cache line locked for fewer cycles. Locks around memory manipulation via inline assembly)


TL:DR: lockless programming requires precision language if you're going to reason about correctness.

cmpxchg fails and clears ZF (creating the ne not-equal condition) if EAX doesn't match the value in memory when it executes. The instruction itself doesn't know or care about races, just the exact state when it itself executes.

You as a programmer have to worry about races and retrying when using it as a building block to create larger atomic operations built on top of it.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847