How costly is mutex locks when thread synchronization is not critical?

Question

I have a an object (kind of a queue) which is accessed across the threads. The queue object can be mutex locked before used by either thread.

A simpler way to manage this is by bringing the lock inside the queue object itself - hence every API will lock the queue and release when the work is done. This way, threads don't have to manage additional mutex variables along with each queue.

Now my question is, sometimes there is only one thread which is accessing queue (say it is a local variable). But since now inherently the queue would first lock its internal data structure and unlock before leaving, will this be a costly affair?

How costly is the redundant mutex_lock and mutex_unlock operation - when there is no specific need of thread synchronization?

PS: My question is slightly related to this one: How efficient is locking an unlocked mutex? What is the cost of a mutex?

But i am looking for a specific answer in my design and understanding of why.

I AM USING C, and pthread libraries.

Test it. It should be simple enough to rig up a quick test no? — Ed S., Aug 31 '12 at 17:23
Why don't you write a simple program that performs the lock/unlock operation in a loop a few thousand times, and measure how long it takes? That will give you a good estimate of how much overhead a single lock/unlock will add when there is no resource contention. — mah, Aug 31 '12 at 17:23
Mutex performance will depend on the pthreads implementation. pthreads will have (sometimes drastically) different performance characteristics between Linux, OS X, Windows, and other platforms because they'll use different native APIs. What platform are we talking? — , Aug 31 '12 at 17:25
Well, there is no single answer as the costs can be different on different platforms. And there is no such thing as "big" or "small" cost, you need to compare it with the rest of operations. Looking at what different frameworks provide, I would create the 2 structures: one with and one without locks. This way whoever needs the locks, can use them, and whoever doesn't need, can use the variant without locks. — Vlad, Aug 31 '12 at 17:26
Mind that just automatically locking every access is perhaps not the ideal solution, as this won't guarantee _transactional integrity_. The users might want to lock bigger operations, like searching for an item and adding one. — Vlad, Aug 31 '12 at 17:29
A pthread mutex is fairly cheap to lock if it is uncontended. Nowadays on the Linux kernel, pthreads uses futexes, and if there's no contention, the kernel should never be invoked. — Kerrek SB, Aug 31 '12 at 17:40
This is too anecdotal for it to be an answer, but when I tested on my x86 machines, i found it cost generally about the same runtime as 100 fully-cached integer operations (my choice of comparison is purely a heuristic as an artifact of my using a hash function as a timing comparison tool... but it turned out to be a convenient way to visualize the runtime costs, so I keep using it in my head) — Cort Ammon, Sep 08 '15 at 23:10

score 1 · Answer 1 · edited May 23 '17 at 12:03

One way to handle this is to have your queue initialization take a parameter that indicates whether a lock should be acquired or not during queue operations. If a queue is being used by a single thread, it gets initialized such that it won't acquire/release locks (or uses a lock object where the acquire/release operations are nops).

See this answer for an example of how boost::pool does something along these lines (although in C++ and as a compile time configuration): https://stackoverflow.com/a/10188784/12711

A similar concept can be applied to C code at runtime, too.

score 0 · Answer 2 · answered Aug 31 '12 at 17:27

0

First of all: Neither the C library nor pthreads implements mutex locking - they call into the kernel to use OS primitives for that. This implies, that the performance of muteces will vary wildly with the base OS.

If you can reduce your portability spectrum to hardware supporting atomic compare-exchange or atomic increase-and-read (such as any x86 from this millennium) you can use atomic increase-and-read to create a threadsafe queue that does not need locking.

For the .Net platform I have such a beast at http://sourceforge.net/projects/dotnetlockless - it should be quite easy to port it to C.

answered Aug 31 '12 at 17:27

Eugen Rieck

64,175
10
70
92

2

The first paragraph is not true. Kernel assistance is needed only on contention unless the implementation is really bad, and in this case, performance depends pretty much 100% on the C/pthreads library. – R.. GitHub STOP HELPING ICE Aug 31 '12 at 19:41
The big downside of a "lockless queue" is that if your consumer hits an empty queue, it can't go to sleep and be woken up when a new item is added to the queue. – caf Sep 11 '12 at 01:13
@caf I am quite positive, this is not true: You wrap the lockless queue together with some blocking primitive and have a blocking queue. This is one of two ways I typically use it - the other one being to return the thread to a thread pool to perform some other work (i.e. mostly do something to fill the queue). To account for the thundering herd problem, I typically wake up only one thread and have it dequeue an element, on success wake up another a.s.o until one thread again hits an empty queue and sleeps or is recycled. The lockless queue being really fast means, that this is very efficient. – Eugen Rieck Sep 11 '12 at 01:35
@EugenRieck: All the overhead of pthreads mutexes is in the kernel calls required to wake up sleepers and await a wakeup in the contended case, so once you implement the "blocking queue" you've typically gained very little over just using the native mutex / condition variable functions. – caf Sep 11 '12 at 04:53
@caf I see a different picture in my usage: A kernel call happens only once for every thread hitting an **empty** queue - consumer to sleep, producer to wake up. Locking/unlocking on every queue/dequeue operation is exactly what I recommended **not** to do - this is the OQ! Personally I think, that having a significant number of threads just sleep-wait routinely is a design with room for improvement. Why not have these threads work on the producer side instead? If there is nothing to do on both sides, there is no contention for the queue anyway. – Eugen Rieck Sep 11 '12 at 11:28

How costly is mutex locks when thread synchronization is not critical?

2 Answers2