Performance difference between mutex and critical section in C++

Question

I was reading this post on performance differences in C# between critical sections and mutexes for a given test case. I'm womdering if there is any further documentation out there that gives performance overheads for the various locking classes for a C++ application, specifically MFC running on a Windows 32 or 64 bit platform?

The reason that I'm asking is that the profiler results I get across broad automated tests show a lot of time spent in mutex code. What I'm trying to figure out is how much of this is reasonable delay while waiting for a resource to become available, and how much is due to the implementation and specifics of the locking structure. I'm only dealing with a single process, which includes multiple threads, and am considering changing to critical sections. Long term automated testing shows that I don't need the time-outs offered by the mutex class.

Hence the question, is anyone aware of any reference documentation relating to the performance overheads of different MFC locking mechanisms on different Windows platforms?

The answers to your linked question perfectly apply to MFC, too. Because like the C# ones, the MFC classes (like `CMutex` or `CCriticalSection`) are nothing more than wrapers around the corresponding Win32 functionality. — Christian Rau, Oct 10 '11 at 12:34
Thanks Christian, I kind of expected this, but was wondering whether the results were language dependent, hardware dependent, etc... and whether there was any hard information relating to performance. As per Mark Ingram's answer below, the MS documentation suggesting critical sections are only 'slightly faster' appears misleading. — SmacL, Oct 10 '11 at 14:56

score 6 · Accepted Answer · edited Oct 10 '11 at 12:38

6

As far as I can understand, a Win32 Mutex is a full blown kernel object. This means that any call to a Mutex will involve a system call. This will often invalidate the cache and therefore can be quite expensive.

Critical Sections are Userside objects that make no use of the kernel in cases where there is no contention. This is probably done using the x86 LOCK assembler instruction or similar to guarantee atomicity. Since no system call is made, it will be faster but because it not a kernel object, there is no way to access a critical section from another process.

edited Oct 10 '11 at 12:38

Christian Rau

45,360
10
108
185

answered Oct 10 '11 at 12:24

doron

27,972
12
65
103

3

Almost. I think by "...from another thread" you actually mean "...from another process." – RobH Oct 10 '11 at 12:29
Thanks for the information. Sounds like I need to run some tests myself on a variety of platforms and conditions. – SmacL Oct 10 '11 at 14:58
1

Right idea, but your reference to the LOCK instruction, while kind of true, is a bit misleading. The user land aspect of a mutex is usually done with a "compare-and-swap" and "atomic_increment" function. It is possible then to see if there was contention without calling the kernel. If there was no contention no kernel call is needed. If there is contention then a kernel call is needed to wait/release the mutex. The man page for Linux's `futex` system call is a good place to get details on such a system (or the pthread mutex code). – edA-qa mort-ora-y Oct 10 '11 at 17:37
I am more familiar with the ARM instruction set. For armv5 you use SWP for armV6 and higher you use LDREX and STREX instructions. with take a look at: http://www.codemaestro.com/reviews/8 – doron Oct 12 '11 at 17:57

score 1 · Answer 2 · answered Oct 10 '11 at 12:32

The crucial difference between Critical Sections and Mutexes in Windows is that you can create a named mutex and use it from multiple processes, whereas there is no way to access a critical section of one process from another.

A consequence of a mutex being available in multiple processes is that access to it must be controlled by the kernel.

score 1 · Answer 3 · answered Oct 10 '11 at 12:50

Read the following support article from Microsoft: http://support.microsoft.com/kb/105678.

Critical sections and mutexes provide synchronization that is very similar, except that critical sections can be used only by the threads of a single process. There are two areas to consider when choosing which method to use within a single process:

Speed. The Synchronization overview says the following about critical sections:

... critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization. Critical sections use a processor-specific test and set instruction to determine mutual exclusion.

Deadlock. The Synchronization overview says the following about mutexes:

If a thread terminates without releasing its ownership of a mutex object, the mutex is considered to be abandoned. A waiting thread can acquire ownership of an abandoned mutex, but the wait function's return value indicates that the mutex is abandoned. WaitForSingleObject() will return WAIT_ABANDONED for a mutex that has been abandoned. However, the resource that the mutex is protecting is left in an unknown state.

There is no way to tell whether a critical section has been abandoned.

The term 'slightly faster' is what interests me here, as Michael's answer in the linked post suggests in excess of 20 times faster. — SmacL, Oct 10 '11 at 14:54

Performance difference between mutex and critical section in C++

3 Answers3