How expensive are semaphores vs. spin locks in practice?

Question

This question asks whether a spin lock can be improved in a way that doesn't compromise latency, but uses less CPU time. A large list of answers suggest high level language concepts in C++11, Boost, and the like.

My first thought was to use a simple C semaphore, since the poster needs to block only when a buffer is empty or full.

In the process of writing up an answer however, I realized I do not know what the overhead of these functions is in practice. Intuitively, it seems like it should be small, and it's never been an optimization issue for me, but perhaps it's substantial vs. a spin lock. Presumably it's also system dependent.

Answers to this question suggest that a spin lock is preferred when locking for less than one thread quanta, but no real world indications are given as to why.

Answers to this question provide a working example of a semaphore implementation in C++ that uses a spin lock with pthread_wait in the body, but it's not taken from any actual language implementation.

Over here, a question about speed differences between a mutex and a semaphore are declared to be insignificant by some. Others say the semaphore is slower.

An article linked to by this question suggests that the C# lock command for a mutex costs 50ns in practice on a 2.4GhZ machine (so ~100 cycles). However, it's unclear whether C#'s implementations are representative of say, a straight C implementation of the POSIX semaphore.

So, the question is, what's the overhead on semaphore use like in practice, and by extension, when should I prefer a spin lock if all I care about is latency (i.e. not maintainability for some reason)?

Kinda depends... on lots of stuff. Are there usually more cores than ready threads? What is the effect on the free memory-bandwidth of a tight loop on a barriered-up atomic int? What is the lock-span - are you copying large objects within the lock or a pointer/smartPointer? — Martin James, Apr 26 '13 at 00:11
Semaphores and mutexs will often transistion to kernel (windows) which is slow, spinlocks dont (except if you exhaust your time slice). There was a really good research paper that covered this ( the paper was on lock free locking, but I cannot find it !). This article http://msdn.microsoft.com/en-us/magazine/cc163726.aspx has some good points too. — rlb, Apr 26 '13 at 04:59

score 2 · Answer 1 · answered Apr 25 '13 at 22:49

I am by no means an expert on this, and so you should take my advice with a grain of salt depending on your circumstances.

A spin lock is implemented using processor instructions that are atomic by nature. As such, acquiring and releasing locks can be extremely fast. Performance degrades from this ideal the longer you hold the lock and the more contention there is for the lock. It is therefore best suited for infrequently updated data. Afaik .NET (4.0+) has its own managed-only spin-lock implementation that avoids the transition into unmanaged code (and the subsequent kernel access), which makes the overhead pretty much insignificant.

Almost every other lock type is based on WaitHandles (the exception, in .NET-land anyway, is the managed-only Monitor class), and unless you're writing a device driver running in kernel-space, performance is likely not to vary too much (because a large portion of the cost is the transition into kernel-space and back). Pick the lock type that best suits your applications needs.

If you really care deeply about it, put together some performance tests that simulate the expected work load, and benchmark your preferred options to see how they stack up.

+1 - sounds good to me - test it on your own app/hardware/OS. — Martin James, Apr 26 '13 at 00:12

How expensive are semaphores vs. spin locks in practice?

1 Answers1