What is thread synchronization and how does it differ form atomicity?

Question

Atomicity can be achieved with machine level instructions such as compare and swap (CS). It could also be achieved with the use of a mutex/lock for a large blocks of code with the OS providing help on it.

On the other hand we also have the concept of memory model. Some machines could have a relaxed model like Arm which could re-order load/stores on a single thread, and some have a more strict model like x86.

I want to confirm my understanding of the term synchronization. Is it pretty much the promise of both atomicity and the memory model? i.e only using atomic ops on a thread doesn't necessary make it synchronized with other threads?

_Atomicity_ usually refers to a single memory operation performed atomically at a hardware level with a single memory location. Such as load, store, increment, or exchange. I wouldn't talk about atomicity with mutexes. We talk about _critical sections_ then. — Daniel Langr, Jan 11 '22 at 14:51
Something _atomic_ is indivisible. Things that are _synchronized_ are happening together in time. — Wyck, Jan 11 '22 at 14:52
I guess the definition of a `mutex` differs from person to person, this post says they are also atomic: https://stackoverflow.com/a/39796200 — Dan, Jan 11 '22 at 14:54
It means that the timing of two things has been coordinated or arranged somehow. For example, to ensure that one thread completes a write operation before another thread begins its read operation is a matter of _synchronization_. Another thing you could talk about is the fact that two _unsynchronized_ writers could potentially corrupt data. If the writes are _atomic_ then no synchronization is required to prevent corruption alone. — Wyck, Jan 11 '22 at 14:59
@Wyck So basically make it atomic AND make sure it follows a certain order (the memory model issue)? — Dan, Jan 11 '22 at 15:05
There are 2 interpretations of 'synchronize'. You have 'synchronize' with mutexes such that only 1 operation at a time is running within the mutex. Synchronization in memory models effectively means memory ordering. E.g. if you have an release store and an acquire load (in C++ they are synchronization operations), when a read sees a particular write, it synchronizes with that write. Which means that the thread that does the read, will see all changes from the writing thread before the write. — pveentjer, Jan 11 '22 at 15:17
So when we talk about `thread synchronization` which one of these two are we referring to? — Dan, Jan 11 '22 at 15:19
You can have atomic operations without synchronization. E.g. a C++ atomic with memory_order_relaxed. It is guaranteed to be atomic, but there are no memory ordering guarantees. — pveentjer, Jan 11 '22 at 15:19
@Dan no, not at all. You don't have to make something atomic for it to be synchronized. You can synchronize access to a controlled resource using a feature like a mutex. The _implementation_ of a mutex probably requires some kind of atomic CPU/memory feature in order to be able to implement a mutex (e.g. test and set lock, or interlocked increment)). Synchronized just means coordinated together in time. — Wyck, Jan 11 '22 at 15:19
@Dan the "thread synchronization" term (none of them actually) isn't well defined. I'd say that typically it means that the observable effect of the program is as if the program was single threaded. Which for me is the same as "thread safe". — freakish, Jan 11 '22 at 15:21
@Dan: when I hear thread synchronization, I'm assuming some kind of mutex, condition variable etc. So some kind of mechanism where threads can wait for each other. Keep in mind that a mutex typically have memory ordering effects as well (I don't know a single mutex that doesn't). — pveentjer, Jan 11 '22 at 15:22
@Wyck but a mutex provides atomicity. How can two threads by synchronized if they don't use atomic operations somehow? — Dan, Jan 11 '22 at 15:25
@Dan you can create a mutex using just atomic loads/stores; no CAS is needed. E.g. Dekker or Petersons algorithm. But these operations need to be synchronization operations (so they need to guarantee the order of surrounding loads/stores). — pveentjer, Jan 11 '22 at 15:28
@Dan I would never say that mutex provides atomicity. Not out of the box anyway. For me "atomic" means "all or nothing", that is not guaranteed by code under mutex. For that you would need some kind of transactional memory. I would say that mutex provides exclusive access. That's it. — freakish, Jan 11 '22 at 15:28
@freakish this post says `Block of code guarded by a mutex is atomic too`: https://stackoverflow.com/a/39796200/14940626 — Dan, Jan 11 '22 at 15:32
@Dan well, you can't control what other people say, now can you? :) — freakish, Jan 11 '22 at 15:35
@pveentjer *so they need to guarantee the order of surrounding loads/stores* right, so this concerns the memory model I was talking about. I feel we are going in circles here. So are you saying `thread synchronization` concerns both atomicity and the memory model then? — Dan, Jan 11 '22 at 15:35
@freakish of course, but there are a lot of posts online say a mutex makes a block (the critical section) atomic. — Dan, Jan 11 '22 at 15:36
They are overloaded terms. Synchronization in memory models effectively means that threads get certain guarantees about the values read from shared memory. For this the surrounding loads/stores with respect to some synchronization action need to be properly ordered within the memory order. Synchronization with threads means something different, e.g. a mutex or thread waiting for the completion of other threads. So there is some form of ordering between the actions of threads. — pveentjer, Jan 11 '22 at 15:39
@Dan as I said: these terms are not well defined. So these people are not necessarily wrong, perhaps they mean something slightly different. For example if I do update, check and update in a single block under mutex. And that is supposed to be atomic. But now check fails. I should write a code that reverts the initial update, otherwise the state won't be consistent, I'll have a partial update. That's why I don't see mutexes as atomic. Such things can be done automatically with so called transactional memory. — freakish, Jan 11 '22 at 15:42
If you gave this question some more context, we could benefit from the rigor of [standard definitions](https://www.iso.org/obp/ui/#iso:std:iso-iec:2382:ed-1:v1:en). So for example, are you talking about this in the context of C++? Would go so far as to tag this as a C++ question? — Wyck, Jan 11 '22 at 16:37
Not really, its a more generic type of question. Assume i'm writing it's code with assembly on an x86. — Dan, Jan 11 '22 at 16:38
Here's an outline of synchronization in operating systems: [University of Waterloo CS350: Operating Systems](https://student.cs.uwaterloo.ca/~cfzhou/notes/cs350.pdf) (Search for _synchronization_ and _atomic_ as required.) — Wyck, Jan 11 '22 at 17:24
@Wyck So based on the above link, Lock/Mutex as well as Atomic instructions like CS provide synchronization (look under `Hardware-Specific Synchronization Instructions`). If agreed could you please provide an answer so I could mark this is solved? — Dan, Jan 11 '22 at 19:27

score 1 · Accepted Answer · answered Jan 11 '22 at 20:57

Something atomic is indivisible. Things that are synchronized are happening together in time.

Atomicity

I like to think of this like having a data structure representing a 2-dimensional point with x, y coordinates. For my purposes, in order for my data to be considered "valid" it must always be a point along the x = y line. x and y must always be the same.

Suppose that initially I have a point { x = 10, y = 10 } and I want to update my data structure so that it represents the point {x = 20, y = 20}. And suppose that the implementation of the update operation is basically these two separate steps:

x = 20
y = 20

If my implementation writes x and y separately like that, then some other thread could potentially observe my point data structure data after step 1 but before step 2. If it is allowed to read the value of the point after I change x but before I change y then that other observer might observe the value {x = 20, y = 10}.

In fact there are three values that could be observed

{x = 10, y = 10} (the original value) [VALID]
{x = 20, y = 10} (x is modified but y is not yet modified) [INVALID x != y]
{x = 20, y = 20} (both x and y are modified) [VALID]

I need a way of updating the two values together so that it is impossible for an outside observer observe {x = 20, y = 10}.

I don't really care when the other observer looks at the value of my point. It is fine it it observes { x = 10, y = 10 } and it is also fine if it observes { x = 20, y = 20 }. Both have the property of x == y, which makes them valid in my scenario.

Simplest atomic operation

The most simple atomic operation is a test and set of a single bit. This operation atomically reads a value of a bit and overwrites it with a 1, returning the state of the bit we overwrote. But we are offered the guarantee that if our operation has concluded then we have the value that we overwrote and any other observer will observe a 1. If many agents attempt this operation simultaneously, only one agent will return 0, and the others will all return 1. Even if it's two CPU's writing on the exact same clock tick, something in the electronics will guarantee that the operation is concluded logically atomically according to our rules.

That's it to logical atomicity. That's all atomic means. It means you have the capability of performing an uninterrupted update with valid data before and after the update and the data cannot be observed by another observer in any intermediate state it may take on during the update. It may be a single bit or it may be an entire database.

x86 Example

A good example of something that can be done on x86 atomically is the 32-bit interlocked increment.

Here a 32-bit (4-byte) value must be incremented by 1. This could potentially need to modify all 4 bytes for this to work correctly. If the value is to be modified from 0x000000FF to 0x00000100, it's important that the 0x00 becomes a 0x00 and the 0xFF becomes a 0x00 atomically. Otherwise I risk observing the value 0x00000000 (if the LSB is modified first) or 0x000001FF (if the MSB is modified first).

The hardware guarantees that we can test and modify 4 bytes at a time to achieve this. The CPU and memory provide a mechanism by which this operation can be performed even if there are other CPUs sharing the same memory. The CPU can assert a lock condition that prevents other CPUs from interfering with this interlocked operation.

Synchronization

Synchronization just talks about how things happen together in time. In the context you propose, it's about the order in which various sections of our program get executed and the order in which various components of our system change state. Without synchronization, we risk corruption (entering an invalid, semantically meaningless or incorrect state of execution of our program or its data)

Let's say we want to have an interlocked increment of a 64-bit number. Let's suppose that the hardware does not offer a way to atomically change 64-bits at a time. We will have to accomplish what we want with more complex data structure that means that even when just reading we can't simply read the most-significant 32 bits and the least-significant 32 bits of our 64-bit number separately. We'd risk observing one part of our 64-bit value changing separately from the other half. It means that we must adhere to some kind of protocol when reading (or writing) this 64-bit value.

To implement this, we need an atomic test and set bit operation and a clear bit operation. (FYI, technically, what we need are two operations commonly referred to as P and V in computer science, but let's keep it simple.) Before reading or writing our data, we perform an atomic test-and-set operation on a single (shared) bit (commonly referred to as a "lock"). If we read a zero, then we know we are the only one that saw a zero and everyone else must have seen a 1. If we see a 1, then we assume someone else is using our shared data, and therefore we have no choice but to just try again. So we loop and keep testing and setting the bit until we observe it as a 0. (This is called a spin lock, and is the best we can do without getting help from the operating system's scheduler.)

When we eventually see a 0, then we can safely read both 32-bit parts of our 64-bit value individually. Or, if we're writing, we can safely write both 32-bit parts of our 64-bit value individually. Once both halves have been read or written, we clear the bit back to 0, permitting access by someone else.

Any such combination of cleverness and use of atomic operations to avoid corruption in this manner constitutes synchronization because we are governing the order in which certain sections of our program can run. And we can achieve synchronization of any complexity and of any amount of data so long as we have access to some kind of atomic data.

Once we have created a program that uses a lock to share a data structure in a conflict-free way, we could also refer to that data structure as being logically atomic. C++ provides a std::atomic to achieve this, for example.

Remember that synchronization at this level (with a lock) is achieved by adhering to a protocol (protecting your data with a lock). Other forms of synchronization, such as what happens when two CPUs try to access the same memory on the same clock tick, are resolved in hardware by the CPUs and the motherboard, memory, controllers, etc. But fundamentally something similar is happening, just at the motherboard level.

What is thread synchronization and how does it differ form atomicity?

1 Answers1

Atomicity

Simplest atomic operation

x86 Example

Synchronization

Linked