How are mutexes implemented?

Question

Are some implementations better than others for specific applications? Is there anything to earn by rolling out your own?

"Is there anything to earn by rolling out your own?" Knowledge? — Chris Lutz, Sep 28 '09 at 08:04
"Is there anything to earn by rolling out your own?" - yeah, flawed code! ;) — Mitch Wheat, Sep 28 '09 at 08:07
@Mitch Wheat - Certainly, one shouldn't be using homebrew mutex libraries for production code, but lots of people like to learn by doing, and writing your own [application x] is very informative. — Chris Lutz, Sep 28 '09 at 08:15
Most people just use/inherit the mutex code from the kernel in an object of their own. Not many developers are looking deeper inside the mutex, fearing it's way more complex than a simple boolean field. And they're right! — Wim ten Brink, Sep 28 '09 at 08:58
Atomic CAS has to be done in the hardware, so truly rolling your own is impossible. — daveb, Sep 28 '09 at 19:08
a lot of people are mentioning test-and-set and compare-and-swap... but on many RISC architectures there is something called load-link store-conditional. a very interesting alternate way to implement atomic primitives on those CPUs, where you have a special load instruction that sets up a "reservation" and a store operation that can "fail". in between that you do computations using ordinary opcodes. if the store fails, you can assume there was a race and retry. — asveikau, Oct 14 '09 at 17:14
@curiousguy: that's fine for writing 'easy' stuff. But there are areas such as mutexes, parallel code, writing your own database etc., that are best left to experts (for obvious reasons). It's often referred to as 'Not invented here' — Mitch Wheat, Jul 03 '19 at 23:35

score 46 · Accepted Answer · answered Sep 28 '09 at 08:12

46

Check out the description of the Test-and-set machine instruction on Wikipedia, which alludes to how atomic operations are achieved at the machine level. I can imagine most language-level mutex implementations rely on machine-level support such as Test-and-set.

answered Sep 28 '09 at 08:12

Adamski

54,009
15
113
152

1

On x86 for example, you can use an `xchg` instruction to atomically swap a register with memory. The store part is the "set", and the load part + branching on the register value is the "test" half of the test-and-set operation. And yes, this is more or less what you do in practice. See [this minimal spinlock implementation in asm](https://stackoverflow.com/questions/37241553/locks-around-memory-manipulation-via-inline-assembly/37246263#37246263) that does most of the important stuff except fall back to sleeping in a system call after spinning for a while without getting the lock. – Peter Cordes Nov 01 '17 at 16:32

score 34 · Answer 2 · edited Jun 02 '17 at 12:57

Building on Adamski's test-and-set suggestion, you should also look at the concept of "fast user-space mutexes" or futexes.

Futexes have the desirable property that they do not require a kernel system call in the common cases of locking or unlocking an uncontended mutex. In these cases, the user-mode code successfully uses an atomic compare and swap (CAS) operation to lock or unlock the mutex.

If CAS fails, the mutex is contended and a kernel system call -- sys_futex under Linux -- must be used either to wait for the mutex (in the lock case) or to wake other threads (in the unlock case).

If you're serious about implementing this yourself, make sure you also read Ulrich Drepper's paper.

score 12 · Answer 3 · edited Sep 28 '09 at 09:59

A mutex preferably runs in the kernel of the operating system while keeping the amount of code around it as short as possible, so it can avoid being cut-off while task-switching to another process. The exact implementation is therefore a bit of a secret. It's not complex though. It's basically an object that has a boolean field, which it gets and sets.

When using a counter, it can become a Semaphore.
A mutex is the starting point for a critical section, which uses a mutex internally to see if it can enter a section of code. If the mutex is free, it sets the mutex and executes the code, only to release the mutex when done. When a critical section notices that a mutex is locked, it can wait for the mutex to be released.

Around the basic mutex logic there are wrappers to wrap it in an object.. Then more wrapper objects to make it available outside the kernel. And then another wrapper to make it available in .NET. And then several programmers will write their own wrapper code around this all for their own logical needs. The wrappers around wrappers really make them a murky territory.

Now, with this basic knowledge about the internals of mutexes, all I hope is that you're going to use one implementation that relies on the kernel and the hardware underneath. These would be the most reliable. (If the hardware supports these.) If the mutex that you're using doesn't work at this kernel/hardware level then it can still be reliable but I would advise to not use it, unless there's no alternative.

As far as I know, Windows, Linux and .NET will all use mutexes at kernel/hardware level.

The Wikipedia page that I've linked to explains more about the internal logic and possible implementations. Preferably, a mutex is controlled by the hardware, thus making the whole getting/setting of the mutex an indivisible step. (Just to make sure the system doesn't switch tasks in-between.)

What do you mean it is a secret? Isn't the entire linux kernel source code available in GitHub? — Sri Hari Vignesh, Sep 03 '17 at 04:25
Oh, geez. It was 8 years ago when I wrote that! :) But yeah, it is a secret as no one really examines the source code for mutexes in the Linux kernel. And those who do check it will generally find deciphering the logic behind it difficult. See https://github.com/torvalds/linux/blob/master/kernel/locking/mutex.c for the Mutex code in Linux... Fortunately, it is well-commented. Still complex, though. As I said, *a bit of* secret... — Wim ten Brink, Oct 06 '17 at 01:59
Yes. And this great answer is a little bit of secret when it is under many other answers :D — hqt, Oct 01 '19 at 02:52

score 4 · Answer 4 · answered Jan 02 '19 at 09:06

A bit of assembly to demonstrate locking atomically:

; BL is the mutex id
; shared_val, a memory address

CMP [shared_val],BL ; Perhaps it is locked to us anyway
JZ .OutLoop2
.Loop1:
CMP [shared_val],0xFF ; Free
JZ .OutLoop1 ; Yes
pause ; equal to rep nop.
JMP .Loop1 ; Else, retry

.OutLoop1:

; Lock is free, grab it
MOV AL,0xFF
LOCK CMPXCHG [shared_val],BL
JNZ .Loop1 ; Write failed

.OutLoop2: ; Lock Acquired

score 2 · Answer 5 · edited May 29 '17 at 08:26

2

Interlocked.CompareExchange is enough to implement spinlocks. It's pretty difficult to do right though. See for Joe Duffy's blog for an example of the subtleties involved.

edited May 29 '17 at 08:26

zennehoy

6,405
28
55

answered Sep 28 '09 at 08:33

Joren

14,472
3
50
54

We're talking about language-agnostic solutions here, but thanks for your effort. – static_rtti Sep 29 '09 at 12:01
Oh, you're right. I don't know why I was thinking .NET. Perhaps because of the other answers. – Joren Sep 29 '09 at 14:21

score -1 · Answer 6 · answered Sep 29 '09 at 07:38

I used Reflector.NET to decompile the source for System.Threading.ReaderWriterLockSlim, which was added to a recent version of the .NET framework.

It mostly uses Interlocked.CompareExchange, Thread.SpinWait and Thread.Sleep to achieve synchronisation. There are a few EventWaitHandle (kernel object) instances that are used under some circumstances.

There's also some complexity added to support reentrancy on a single thread.

If you're interested in this area and working in .NET (or at least, can read it) then you might find it quite interesting to check this class out.

How are mutexes implemented?

6 Answers6

Linked