Are mutexes really slower?

Question

I have read so many times, here and everywhere on the net, that mutexes are slower than critical section/semaphores/insert-your-preferred-synchronisation-method-here. But I have never seen any paper or study or whatever to back up this claim.

Where does this idea come from? Is it a myth or a reality? Are mutexes really slower?

question doesn't make a great deal of sense without more context? — Mitch Wheat, Nov 03 '09 at 11:02
it is expressly general... as a Windows developper, i am interested in a Windows answer, but i also want to know if differences exists between different os or architecture. the underlying question being: if i had to chose a synchronisation mechanism, would i use a mutex or should i first consider other mechanisms ? (actually, i use a mutex or a binary-semaphore without trying anything else) — Adrien Plisson, Nov 03 '09 at 11:15
The critical section in windows is a tradeoff between spin lock and a semaphore. — Christopher, Nov 03 '09 at 12:00

Mick · Accepted Answer · 2023-02-22T10:25:53.907

20

In the book "Multithreading applications in win32" by Jim Beveridge and Robert Wiener it says: "It takes almost 100 times longer to lock an unowned mutex than it does to lock an unowned critical section because the critical section can be done in user mode without involving the kernel"

And on MSDN here it says "critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization"

edited Feb 22 '23 at 10:25

answered Nov 03 '09 at 11:07

Mick

8,284
22
81
173

So are there any disadvantages of a critical section? I'm really a java programmer and in java the two terms get used synonymously. – Benj Nov 03 '09 at 11:18
29

In Java, it's all slow ;-) – Dave Markle Nov 03 '09 at 11:19
@Benj: I don't think you can use native Win32 Critical Section objects in Java, because once initialized, the Critical Section data structure must not be moved in memory. i.e. it wouldn't survive a compacting GC and it mustn't be passed around by value. – Niki Nov 03 '09 at 12:29
5

@Benj -- since critical section is not a kernel object, it works only within single process. If you need to synchronize access across several processes, then mutex is your choice. – atzz Nov 03 '09 at 12:42

Spence · Answer 2 · 2009-11-04T20:50:01.540

I don't believe that any of the answers hit on the key point of why they are different.

Mutexes are at operating system level. A named mutex exists and is accessible from ANY process in the operating system (provided its ACL allows access from all).

Critical sections are faster as they don't require the system call into kernel mode, however they will only work WITHIN a process, you cannot lock more than one process using a critical section. So depending on what you are trying to achieve and what your software design looks like, you should choose the most appropriate tool for the job.

I'll additionally point out to you that Semaphores are separate to mutex/critical sections, because of their count. Semaphores can be used to control multiple concurrent access to a resource, where as a mutex/critical section is either being accessed or not being accessed.

+1 for pointing out the different nature of synchronisation achieved with those 2 constructs. — Adrien Plisson, Nov 04 '09 at 08:28

score 5 · Answer 3 · edited Jun 20 '20 at 09:12

5

A CRITICAL_SECTION is implemented as a spinlock with a capped spin count. See MSDN InitializeCriticalSectionAndSpinCount for the indication of this.

When the spin count 'elapsed', the critical section locks a semaphore (or whatever kernel-lock it is implemented with).

So in code it works like this (not really working, should just be an example) :

CRITICAL_SECTION s;

void EnterCriticalSection( CRITICAL_SECTION* s )
{
    int spin_count = s.max_count;
    while( --spin_count >= 0 )
    {
        if( InterlockedExchange( &s->Locked, 1 ) == 1 )
        {
           // we own the lock now
           s->OwningThread = GetCurrentThread();
           return;
        }
    }
    // lock the mutex and wait for an unlock
    WaitForSingleObject( &s->KernelLock, INFINITE );
}

So if your critical section is only held a very short time, and the entering thread does only wait very few 'spins' (cycles) the critical section can be very efficient. But if this is not the case, the critical section wastes many cycles doing nothing, and then falls back to a kernel synchronization object.

So the tradeoff is :

Mutex : Slow acquire/release, but no wasted cycles for long 'locked regions'

CRITICAL_SECTION : Fast acquire/release for unowned 'regions', but wasted cycles for owned sections.

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 03 '09 at 11:52

Christopher

8,912
3
33
38

Actually, a mutex wastes more cycles for the context switch than a EnterCriticalSections spends spinning. That's the point of spinning. – Niki Nov 03 '09 at 12:32
In a worst case scenario, every EnterCriticalSection, needs spin_count cycles + the cycles of the WaitForSingleObject. – Christopher Nov 03 '09 at 12:59
the MSDN says that the spin count is always zero on monoprocessor systems... does this mean that a critical section is the same as a mutex on those processors ? – Adrien Plisson Nov 03 '09 at 14:25
Not completely. You can easily check if another thread is currently trying to acquire the lock, because only one can run at any given time. So, if it is locked, we can directly go to the mutex to get our thread to sleep. If it is not locked, we get the same behavior as on a multicore system. – Christopher Nov 03 '09 at 14:54
you are right, i misread your pseudo code: the kernel object acquisition only occurs when the critical section is already locked. – Adrien Plisson Nov 04 '09 at 08:24
Really interesting point I read in a microsoft press book, can't for the life of me remember where, but it was a performance optimisation, basically avoiding the sys call if you get a fast release was statistically faster in most of their perf testing than taking a mutex immediately with a sys call, even though you are wasting cycles and not yielding. – Spence Nov 02 '16 at 03:26

score 3 · Answer 4 · answered Nov 03 '09 at 11:13

Yes, critical sections are more efficient. For a very good explanation, get "Concurrent Programming on Windows".

In a nutshell: a mutex is a kernel object, so there is always a context switch when you acquire one, even if "free". A critical section can be acquired without a context switch in that case, and (on an multicore/processor machine) it will even spin a few cycles if it's blocked to prevent the expensive context switch.

score 3 · Answer 5 · edited May 23 '17 at 12:26

3

A mutex (at least in windows) allows for synchronizations between different processes in addition to threads. This means extra work must be done to ensure this. Also, as Brian pointed out, using a mutex also requires a switch to "kernel" mode, which causes another speed hit (I believe, i.e. infer, that the kernel is required for this interprocess synchronization, but I've got nothing to back me up on that).

Edit: You can find explicit reference to interprocess synchronization here and for more info on this topic, have a look at Interprocess Synchronization

edited May 23 '17 at 12:26

Community

1
1

answered Nov 03 '09 at 11:17

Grant Peters

7,691
3
45
57

Kernel mode is required because only in Kernel mode can you guarantee that the mutex object won't be written by another thread. Also as this is shared between processes the only secure way to implement a mutex is after a security check via the kernel. Finally only the kernel can control the IO wait queue which is where the thread will be placed in order to wait on acquiring the mutex. – Spence Mar 23 '12 at 06:00

Are mutexes really slower?

5 Answers5

Linked