13

I was recently reading about the Compare And Swap atomic action (CMPXCHG, .NET's Interlocked.CompareExchange, whatever).

I understand how it works internally, and how it's used from a client.

What I can't quite figure out is when would someone use CAS?

Wikipedia says:

CAS is used for implementing synchronization primitives like semaphores and mutexes, likewise more sophisticated lock-free and wait-free algorithms.

So, can anyone give me a more generic real-world use case with code and description of CAS usage?

This question is meant to be language-agnostic, so any language will do (C-based or x86 assembly preferred).

Thanks!

Community
  • 1
  • 1

3 Answers3

9

This is easy to see by example. Say we want to atomically and concurrently set a bit on a shared variable:

int shared = 0;

void Set(int index) {
 while (true) {
  if (Interlocked.CompareExchange<int>(ref shared, shared | (1 << index), shared) == shared)
   break; //success
 }
}

We detect failure if we see that the "old value" (which is the return value) has changed in the meantime.

If this did not happen we did not have a concurrent modification so our own modification went through successfully.

You can realize pretty complex stuff using this technique. The more complex the more performance loss through spinning, though.

I want to emphasize that a key property of CAS is that it can fail and that failure can be detected reliably.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Thanks! Can you clarify "You can realize pretty complex stuff.." ? What kind of things? – Stanislav Nedelchev Apr 26 '12 at 06:29
  • 2
    I think you can realize an arbitrary function "F(shared)" with this. In my example F(shared) was "shared | (1 << index)". You can put anything there, including function calls. It will still be atomic. But you need to be aware that the loop can spin multiple times so your function should be able to be called multiple times. – usr Apr 26 '12 at 09:27
  • Note that if you aren't going to return the old value, it's more efficient to do an Interlocked `Or` (on x86-64, a single `lock or` instruction) instead of a CAS retry loop. It's better not to synthesize things from CAS when the language can express them in a simpler way that can be a single atomic RMW instruction on some machines. x86 doesn't have `fetch_or` (although it does have fetch_add via `lock xadd`), but for single bits it does have `lock bts` to atomically set a bit in memory and record the old value. (Bit Test and Set) – Peter Cordes Oct 11 '21 at 19:40
  • I might have picked something like a left shift or clearing the lowest set bit (`x &= x-1`) as a demo for a CAS retry loop; something that couldn't be done with a different Interlocked operation. Or as part of implementing atomic `float`, if C# can't do that for you. – Peter Cordes Oct 11 '21 at 19:42
6

You use CAS to set a value (a bit or a word) atomically in one thread or process, while testing that another thread/process has not already done so. So it's used to acquire a flag or counter in a multi-threaded environment.

Addendum (Feb 2023)

For example, multiple threads could each use a CAS instruction to swap their process-ID into a shared word of memory (which starts out holding a value of zero). The first thread that gets its process-ID stored into the word can then take ownership of whatever resource that shared word is guarding.

When the process is done with the resource, it stores a zero into the word, releasing ownership of the resource and allowing other threads their turn to acquire the resource.

David R Tribble
  • 11,918
  • 5
  • 42
  • 52
  • That was when we had just a single core. Nowadays threads can span across multiple cores so how does that apply? (I might be missing something) – kubal5003 Apr 25 '12 at 00:12
  • 5
    @kubal5003 - This works *especially* with multiple cores, since it guarantees atomic (single CPU/thread/core) access to a word. – David R Tribble Apr 26 '12 at 23:13
  • For a single boolean flag, `xchg` (Interlocked.Exchange) on the containing byte is about equally good. To take a lock, you swap in a `1` and see if the old value was `0`. (Like this [x86-64 asm spinlock toy example](https://stackoverflow.com/questions/37241553/locks-around-memory-manipulation-via-inline-assembly/37246263#37246263)). Not a use-case where CAS really proves its worth vs. simpler things. It has more value for a counted lock / semaphore, where you don't want to just `fetch_add` (x86 `lock xadd`) with `-1` because that could take the counter below 0. – Peter Cordes Oct 11 '21 at 19:46
-1

So, can anyone give me a more generic real-world use case with code and description of CAS usage?

This paper uses CAS to implement a thread safe queue without locks.

It has some pseudo code examples in it.

anon
  • 1
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Hamed Hajiloo Oct 11 '21 at 17:36
  • 1
    Welcome to Stack Overflow! Please improve your answer by elaborating or explaining the contents of the link. It should be able to stand on its own, with external documentation or resources used only as a supplement for further reading. – spicy.dll Oct 11 '21 at 19:44