Understanding synchronization with multiple processors

Question

In JCIP Section 15.2 B. Goetz mentioned that in modern processors there was a set of instructions like compare-and-swap (CAS) which allowed us to perform non-blocking update operations.

In particular he said

CAS has three operands - a memory location V on which to operate, the expected old value A, and the new value B. CAS atomically updates V to the new value B, but only if the value in V matches the expected old value A.

Now consider a multiprocessor system. If two different processors try to compare-and-swap at the exactly same time on the same memory location, what will happen than?

The processors do not know about each other. How does such a situation manage?

Possibly a duplicate of [*What happens if two process in different processors try to acquire the lock at EXACTLY same time*](http://stackoverflow.com/questions/7863657/what-happens-if-two-process-in-different-processors-try-to-acquire-the-lock-at-ex). Basically: The processors may not be aware of each other, but the bus controller is aware of both of them. — T.J. Crowder, May 01 '16 at 08:15
And even if you had a JVM that ran across a cluster (which I don't know of one, but it could be done in theory), you'd be able to get these semantics -- albeit using fairly slow mechanisms (negotiating across the cluster, etc). — yshavit, May 01 '16 at 08:30

score 2 · Accepted Answer · answered May 01 '16 at 10:33

CAS operations in the end rely on specific hardware instructions, so the actual implementation is hardware-specific. If you want to learn about the details, you need to check the specific architecture on which you're running.

On the most popular x86 architecture, you can think about CAS operations as using a lock but on a much more granular level in the hardware.
- E.g. one way to achieve a CAS operation is with LOCK CMPXCHG. LOCK is an instruction prefix that ensures exclusive access to the corresponding memory location for the duration of the accompanying instruction. The accompanying instruction in this case is CMPXCHG, which can literally compare and swap the value at a given memory location as a single instruction.
- When you execute myAtomicInt.compareAndSet(0, 1) from a couple of different threads and that results in a LOCK CMPXCHG from a couple of different processors, one lucky processor will gain exclusivity first and perform its operation successfully. The locked access to this memory location is linearizable, so the remaining processors that will gain exclusivity later will have their CASes fail (unless an interleaving change introduces an ABA problem).
- The memory bus locking can still be elided (see 8.1.4 Effects of a LOCK Operation on Internal Processor Caches in Intel® 64 and IA-32 Architectures Software Developer’s Manual
Another hardware approach for executing CAS operations is LL/SC (Load Linked/Store Conditional) which uses somewhat different mechanism with somewhat different characteristics.
- Roughly, it splits the CAS in two parts - the initial load and the conditional store, and it discards the store if there was a change on the related memory between the initial load and the store. This can allow you to avoid the ABA problem, but it has different scalability characteristics.
- Here's an example usage of LL/SC in ARM using its LDREX and STREX instructions.
Some architectures like SPARC don't have dedicated CAS instructions. On them CAS needs to be implemented as a combination of instructions and eventually memory barriers.

_The memory bus locking can still be elided_ How is it possible? I thought that it was memory bus locking which provided guaratees that CPU had exclusive access to some memory location. So that other CPUs blocks untli the first is done. — , May 01 '16 at 17:14
See the link I've provided immediately after that statement. In short, if the writing processor has exclusive access to that memory location in its cache, it can avoid locking the bus and instead rely on the cache coherency protocol to ensure the atomicity of the operation. — Dimitar Dimitrov, May 01 '16 at 19:37
_if the writing processor has exclusive access to that memory location in its cache_ It sounds a bit confused. Here's how I unherstand whay you said: assume that we have CPUs each of which has some cache and cache coherency protocol. If one of these CPUs asks for an exclusive access to some memory location to do some operation it (depending on the cache coherency protocol) may perform the operation on the value that is in its cache and flush changes to shared memory later. Is my understanding close to the real now? — , May 02 '16 at 17:26

score 0 · Answer 2 · answered May 01 '16 at 08:24

The processors do not know about each other.

I'm pretty sure they're aware of other processors, have a look the MSI protocol...

https://en.wikipedia.org/wiki/MSI_protocol

There are more modern variations of MSI ie MESI etc that deal specifically with this problem. Imagine a world where the CPUs could mutate data with no mechanism to ensure the data was coherent among the CPU's.

There are only two hard things in Computer Science: cache invalidation, naming things and off by one errors.

score 0 · Answer 3 · answered May 01 '16 at 09:25

The processors do not know about each other.

Processors don't know anything they are just silicon on a chip.

If two different processors try to compare-and-swap at the exactly same time on the same memory location, what will happen than?

If a CPU's cache has exclusive access to a cache line it can perform the update directly. If it doesn't have exclusive access. The L2 cache coherence bus with get it either from another CPU's cache, or from L3 cache, and then it can perform the operation, or not. What make the CAS different is that it doesn't block for long. It can return false to say it couldn't grab the cache line, however call this enough and the processor should grab it at some point in the future.

How does such a situation manage?

In short the L2 cache coherence bus transports this information and ensures thread safe operations work as they should.

Let me clarify a bit. So, CAS blocks anyway if we have contention. But this blocking is not considered as "a real blocking" with context-switching and shecduling overhead, correct? — , May 01 '16 at 17:10
@DmitriiBundin the OS is not involved, and it only make a single attempt to acquire the cache line or it returns false. — Peter Lawrey, May 01 '16 at 17:50

java8.being · Answer 4 · 2016-05-01T09:39:14.890

Each process has it's own memory where as for inter communication, processes do use shared memory. In case of shared memory, we need to synchronize reading/writing in that shared memory. As per quote:

CAS has three operands - a memory location V on which to operate, the expected old value A, and the new value B. CAS atomically updates V to the new value B, but only if the value in V matches the expected old value A.

Since this is atomic and not only atomic to single processor but multiple. Atomicity here is implemented on bus. CAS locks the bus in order to ensure atomicity of operation.

For in depth understanding of how all this works and what algorithms are used to ensure atomicity, please refer to 8.1.3 Multiprocessor Synchronization

Understanding synchronization with multiple processors

4 Answers4