The point of CAS is that it allows you to store a value that is conditional on the original value. A simple store can be done atomically, of course, but it may not be what you want.
Consider the simple "atomic increment" operation:
- Load the current value.
- Compute the new value.
- Store the new value.
If you used unrelated atomic loads and stores for this procedure, then two threads can read, compute and store the same value, and so one of the increments is lost! Loads and stores are simple atomic operations, whereas CAS is an atomic read-modify-write (RMW) operation. RMWs are more complex than simple loads and stores. RMWs don't only guarantee that a simple value is produced correctly, but that a complex logical operation is applied correctly.
Let's implement atomic increment using CAS to demonstrate how this works:
// Effect: x += n, returns old value of x.
int inc(int n, std::atomic<int> & x)
{
int old_val = x.load();
for (;;) {
int new_val = old_val + n;
if (x.compare_exchange_weak(old_val, new_val)) {
return old_val;
}
// Note: If the exchange fails, old_val is updated
// to the current value of x.
}
}
The crucial point here is that this operation must be a loop, because while we are computing our tentative result (new_val = old_val + n
), the value old_val
may have become obsolete, because some other thread already modified it. So we must loop for until we get a chance to apply the new value in a situation where x
currently holds the old value. That's the point of CAS: It stores a new value conditional on the old value being what we think it is.
On the distinction of "strong" vs "weak" exchange: If you want your algorithm to succeed unconditionally, you always want a loop like I showed, and you use the weak exchange. The difference is that the weak form may "fail spuriously", i.e. return false
even though the stored value is the expected one. (This relaxation allows for more efficient instructions on some platforms.) The strong form may be more expensive but doesn't fail spuriously, and it may be useful in situations where you perform an "optimistic" CAS but don't care whether it succeeds. This is useful in certain concurrent data structures (e.g. queues) where threads make a best effort to help out, but don't need to promise success.