205

POSIX allows mutexes to be recursive. That means the same thread can lock the same mutex twice and won't deadlock. Of course it also needs to unlock it twice, otherwise no other thread can obtain the mutex. Not all systems supporting pthreads also support recursive mutexes, but if they want to be POSIX conform, they have to.

Other APIs (more high level APIs) also usually offer mutexes, often called Locks. Some systems/languages (e.g. Cocoa Objective-C) offer both, recursive and non recursive mutexes. Some languages also only offer one or the other one. E.g. in Java mutexes are always recursive (the same thread may twice "synchronize" on the same object). Depending on what other thread functionality they offer, not having recursive mutexes might be no problem, as they can easily be written yourself (I already implemented recursive mutexes myself on the basis of more simple mutex/condition operations).

What I don't really understand: What are non-recursive mutexes good for? Why would I want to have a thread deadlock if it locks the same mutex twice? Even high level languages that could avoid that (e.g. testing if this will deadlock and throwing an exception if it does) usually don't do that. They will let the thread deadlock instead.

Is this only for cases, where I accidentally lock it twice and only unlock it once and in case of a recursive mutex, it would be harder to find the problem, so instead I have it deadlock immediately to see where the incorrect lock appears? But couldn't I do the same with having a lock counter returned when unlocking and in a situation, where I'm sure I released the last lock and the counter is not zero, I can throw an exception or log the problem? Or is there any other, more useful use-case of non recursive mutexes that I fail to see? Or is it maybe just performance, as a non-recursive mutex can be slightly faster than a recursive one? However, I tested this and the difference is really not that big.

jotik
  • 17,044
  • 13
  • 58
  • 123
Mecki
  • 125,244
  • 33
  • 244
  • 253

8 Answers8

166

The difference between a recursive and non-recursive mutex has to do with ownership. In the case of a recursive mutex, the kernel has to keep track of the thread who actually obtained the mutex the first time around so that it can detect the difference between recursion vs. a different thread that should block instead. As another answer pointed out, there is a question of the additional overhead of this both in terms of memory to store this context and also the cycles required for maintaining it.

However, there are other considerations at play here too.

Because the recursive mutex has a sense of ownership, the thread that grabs the mutex must be the same thread that releases the mutex. In the case of non-recursive mutexes, there is no sense of ownership and any thread can usually release the mutex no matter which thread originally took the mutex. In many cases, this type of "mutex" is really more of a semaphore action, where you are not necessarily using the mutex as an exclusion device but use it as synchronization or signaling device between two or more threads.

Another property that comes with a sense of ownership in a mutex is the ability to support priority inheritance. Because the kernel can track the thread owning the mutex and also the identity of all the blocker(s), in a priority threaded system it becomes possible to escalate the priority of the thread that currently owns the mutex to the priority of the highest priority thread that is currently blocking on the mutex. This inheritance prevents the problem of priority inversion that can occur in such cases. (Note that not all systems support priority inheritance on such mutexes, but it is another feature that becomes possible via the notion of ownership).

If you refer to classic VxWorks RTOS kernel, they define three mechanisms:

  • mutex - supports recursion, and optionally priority inheritance. This mechanism is commonly used to protect critical sections of data in a coherent manner.
  • binary semaphore - no recursion, no inheritance, simple exclusion, taker and giver does not have to be same thread, broadcast release available. This mechanism can be used to protect critical sections, but is also particularly useful for coherent signalling or synchronization between threads.
  • counting semaphore - no recursion or inheritance, acts as a coherent resource counter from any desired initial count, threads only block where net count against the resource is zero.

Again, this varies somewhat by platform - especially what they call these things, but this should be representative of the concepts and various mechanisms at play.

Tall Jeff
  • 9,834
  • 7
  • 44
  • 61
  • 15
    your explanation about non-recursive mutex sounded more like a semaphore. A mutex (whether recursive or non-recursive ) has a notion of ownership. – Jay D Jan 07 '11 at 22:14
  • @JayD It's very confusing when people argue about things like these.. so who's the entity that defines these things? – Pacerier Dec 08 '11 at 16:09
  • 16
    @Pacerier The relevant standard. This answer is e.g. wrong for posix (pthreads) , where unlocking a normal mutex in a thread other than the thread that locked it is undefined behavior, while doing the same with an error checking or recursive mutex results in a predicable error code. Other systems and standards might behave very different. – nos Aug 06 '12 at 21:26
  • Perhaps this is naive, but I was under the impression that the central idea of a mutex is that the locking thread unlocks the mutex and then other threads may do the same. From https://computing.llnl.gov/tutorials/pthreads/: – user657862 Oct 22 '13 at 17:31
  • Pressed enter too soon: From https://computing.llnl.gov/tutorials/pthreads/: "pthread_mutex_unlock() will unlock a mutex if called by the owning thread. Calling this routine is required after a thread has completed its use of protected data if other threads are to acquire the mutex for their work with the protected data. An error will be returned if: If the mutex was already unlocked If the mutex is owned by another thread" – user657862 Oct 22 '13 at 17:39
  • What is "broadcast release"? – curiousguy Jun 18 '19 at 13:56
  • 2
    @curiousguy - a broadcast release frees any and all threads blocked on the semaphore without explicitly giving it (remains empty) whereas a normal binary give would only release the thread at the head of the waiting queue (assuming there is one blocked). – Tall Jeff Jul 02 '19 at 08:34
135

The answer is not efficiency. Non-reentrant mutexes lead to better code.

Example: A::foo() acquires the lock. It then calls B::bar(). This worked fine when you wrote it. But sometime later someone changes B::bar() to call A::baz(), which also acquires the lock.

Well, if you don't have recursive mutexes, this deadlocks. If you do have them, it runs, but it may break. A::foo() may have left the object in an inconsistent state before calling bar(), on the assumption that baz() couldn't get run because it also acquires the mutex. But it probably shouldn't run! The person who wrote A::foo() assumed that nobody could call A::baz() at the same time - that's the entire reason that both of those methods acquired the lock.

The right mental model for using mutexes: The mutex protects an invariant. When the mutex is held, the invariant may change, but before releasing the mutex, the invariant is re-established. Reentrant locks are dangerous because the second time you acquire the lock you can't be sure the invariant is true any more.

If you are happy with reentrant locks, it is only because you have not had to debug a problem like this before. Java has non-reentrant locks these days in java.util.concurrent.locks, by the way.

David Harkness
  • 35,992
  • 10
  • 112
  • 134
Jonathan
  • 2,132
  • 1
  • 11
  • 8
  • 4
    It took me a while to get what you were saying about the invariant not being valid when you grab the lock a second time. Good point! What if it were a read-write lock (like Java's ReadWriteLock) and you acquired the read lock and then re-acquired the read lock a second time in the same thread. You wouldn't invalidate an invariant after acquiring a read lock right? So when you acquire the second read lock, the invariant is still true. – dgrant Jul 09 '11 at 00:29
  • 1
    @Jonathan Does _Java has non-reentrant locks these days in java.util.concurrent.locks_?? – user454322 Oct 13 '12 at 19:49
  • 1
    +1 I guess, that the most common use for reentrant lock is inside of a single class, where some methods can be called from both guarded and non-guarded code pieces. This can be actually always factored out. @user454322 Sure, `Semaphore`. – maaartinus Sep 27 '14 at 20:45
  • 1
    Pardon my misunderstanding, but I don't see how this is relevant to mutex. Suppose there are no multithreading and locking involved, `A::foo()` may still have left the object in an inconsistent state before calling `A::bar()`. What does mutex, recursive or not, have anything to do with this case? – Siyuan Ren Apr 05 '15 at 12:12
  • 1
    @SiyuanRen: The issue is being able to reason locally about code. People (at least me) are trained to recognize locked regions as invariant maintaining, that is at the time you acquire the lock no other thread is modifying the state, so the invariants on the critical region hold. This is not a hard rule, and you can code with the invariants not being held in mind, but that would just make your code harder to reason and maintain. The same happens in single threaded mode without mutexes, but there we are not trained to reason locally around the protected region. – David Rodríguez - dribeas May 11 '15 at 10:40
  • 1
    This is the best answer, but not universal. We are trained to recognized methods content as invariant maintaining. So when calling `bar` from `foo` one must ensure the invariants are enough for `bar`, but it doesn't mean that the larger operation `foo` is doing, can be cut there and let way to other threads. Recursive locks makes sense, otherwise that would mean you could release the lock before calling `bar`. Most of the time you really cannot, even though the invariants have been taken care of. Because in the continuation (rest of `foo`) you make expectations that lives from its beginning. – v.oddou Jun 08 '16 at 10:17
  • `foo() { f1(); bar(); f2(); }` f2 is the continuation. this is sequential coupling you might say, but f2 invariants are greater than bar invariants, they depend on work done by f1 and bar together, and uncut. foo is an atomic pack or a hazard. recusrive locks solve this. – v.oddou Jun 08 '16 at 10:19
  • 1
    There's an error in this answer: "The person who wrote A::foo() assumed that nobody could call A::baz() at the same time". If it's the same thread, it's not at the same time. Recursive mutexes are still correct in every sense except efficiency. – Adrian May Aug 04 '17 at 13:49
  • 1
    The pointed issue has nothing to do with mutexes and multithreading. It's common issue. You can happily achieve this very situation with foo() bar() and baz() in any single-threaded program. – Megamozg Dec 28 '17 at 12:50
  • I fully agree with "the mutex protects an invariant" (by which I assume you mean the protected data must be in a stable state). However, it's possible to guarantee this in your own code if you just ensure the data is in a stable state *before* calling outside of your own control. – paxdiablo Aug 19 '19 at 06:16
100

As written by Dave Butenhof himself:

"The biggest of all the big problems with recursive mutexes is that they encourage you to completely lose track of your locking scheme and scope. This is deadly. Evil. It's the "thread eater". You hold locks for the absolutely shortest possible time. Period. Always. If you're calling something with a lock held simply because you don't know it's held, or because you don't know whether the callee needs the mutex, then you're holding it too long. You're aiming a shotgun at your application and pulling the trigger. You presumably started using threads to get concurrency; but you've just PREVENTED concurrency."

Chris Cleeland
  • 4,760
  • 3
  • 26
  • 28
  • 11
    Also note the final part in Butenhof's response: `...you're not DONE until they're [recursive mutex] all gone.. Or sit back and let someone else do the design.` – user454322 Oct 13 '12 at 19:04
  • 2
    He also tells that using a single global recursive mutex (his opinion is that you need only one) is okay as a crutch to consciously postpone the hard work of understanding the invariances of an external library when you start to use it in multithreaded code. But you should not use crutches for ever, but eventually invest the time to understand and fix the concurrency invariants of the code. So we could paraphrase that using recursive mutex is technical debt. – FooF Sep 21 '15 at 05:53
12

The right mental model for using mutexes: The mutex protects an invariant.

Why are you sure that this is really right mental model for using mutexes? I think right model is protecting data but not invariants.

The problem of protecting invariants presents even in single-threaded applications and has nothing common with multi-threading and mutexes.

Furthermore, if you need to protect invariants, you still may use binary semaphore wich is never recursive.

  • True. There are better mechanisms to protect an invariant. – ActiveTrayPrntrTagDataStrDrvr Feb 24 '14 at 17:51
  • 11
    This should be a comment to the answer that offered that statement. Mutexes not only protect data, they also protect invariants. Try to write some simple container (the simplest being a stack) in terms of atomics (where the data protects itself) instead of mutexes and you will understand the statement. – David Rodríguez - dribeas Feb 25 '15 at 13:17
  • Mutexes don't protect data, they protect an invariant. That invariant can though be used to protect data. – Jon Hanna Jan 16 '18 at 12:31
6

One main reason that recursive mutexes are useful is in case of accessing the methods multiple times by the same thread. For example, say if mutex lock is protecting a bank A/c to withdraw, then if there is a fee also associated with that withdrawal, then the same mutex has to be used.

avis
  • 61
  • 1
  • 1
6

The only good use case for recursion mutex is when an object contains multiple methods. When any of the methods modify the content of the object, and therefore must lock the object before the state is consistent again.

If the methods use other methods (ie: addNewArray() calls addNewPoint(), and finalizes with recheckBounds()), but any of those functions by themselves need to lock the mutex, then recursive mutex is a win-win.

For any other case (solving just bad coding, using it even in different objects) is clearly wrong!

DarkZeros
  • 8,235
  • 1
  • 26
  • 36
  • I could not agree more. There are only bad options here: 1. Don't use any locks from within member functions - instead have the calling code lock before it invokes any function ("not my problem" approach). 2. Invent some "same-thread-has-lock-already" program logic for each class which needs to be locked. More code, hard to get right (races), maintainers still have to know how do that right. 3. Design for immutability (your 10000000 element list when modified returns a new list) (cannot use out of the box types for efficiency reasons). 4. Customer hates your constantly deadlocked application. – BitTickler Oct 04 '20 at 19:11
  • yes, thats why recursive mutextes has been invented. – Abhishek Sagar Nov 14 '21 at 09:45
2

IMHO, most arguments against recursive locks (which are what I use 99.9% of the time over like 20 years of concurrent programming) mix the question if they are good or bad with other software design issues, which are quite unrelated. To name one, the "callback" problem, which is elaborated on exhaustively and without any multithreading related point of view, for example in the book Component software - beyond Object oriented programming.

As soon as you have some inversion of control (e.g. events fired), you face re-entrance problems. Independent of whether there are mutexes and threading involved or not.

class EvilFoo {
  std::vector<std::string> data;
  std::vector<std::function<void(EvilFoo&)> > changedEventHandlers;
public:
  size_t registerChangedHandler( std::function<void(EvilFoo&)> handler) { // ...  
  }
  void unregisterChangedHandler(size_t handlerId) { // ...
  }
  void fireChangedEvent() { 
    // bad bad, even evil idea!
    for( auto& handler : changedEventHandlers ) {
      handler(*this);
    }
  }
  void AddItem(const std::string& item) { 
    data.push_back(item);
    fireChangedEvent();
  }
};

Now, with code like the above you get all error cases, which would usually be named in the context of recursive locks - only without any of them. An event handler can unregister itself once it has been called, which would lead to a bug in a naively written fireChangedEvent(). Or it could call other member functions of EvilFoo which cause all sorts of problems. The root cause is re-entrance. Worst of all, this could not even be very obvious as it could be over a whole chain of events firing events and eventually we are back at our EvilFoo (non- local).

So, re-entrance is the root problem, not the recursive lock. Now, if you felt more on the safe side using a non-recursive lock, how would such a bug manifest itself? In a deadlock whenever unexpected re-entrance occurs. And with a recursive lock? The same way, it would manifest itself in code without any locks.

So the evil part of EvilFoo are the events and how they are implemented, not so much a recursive lock. fireChangedEvent() would need to first create a copy of changedEventHandlers and use that for iteration, for starters.

Another aspect often coming into the discussion is the definition of what a lock is supposed to do in the first place:

  • Protect a piece of code from re-entrance
  • Protect a resource from being used concurrently (by multiple threads).

The way I do my concurrent programming, I have a mental model of the latter (protect a resource). This is the main reason why I am good with recursive locks. If some (member) function needs locking of a resource, it locks. If it calls another (member) function while doing what it does and that function also needs locking - it locks. And I don't need an "alternate approach", because the ref-counting of the recursive lock is quite the same as if each function wrote something like:

void EvilFoo::bar() {
   auto_lock lock(this); // this->lock_holder = this->lock_if_not_already_locked_by_same_thread())
   // do what we gotta do
   
   // ~auto_lock() { if (lock_holder) unlock() }
}

And once events or similar constructs (visitors?!) come into play, I do not hope to get all the ensuing design problems solved by some non-recursive lock.

BitTickler
  • 10,905
  • 5
  • 32
  • 53
1

What are non-recursive mutexes good for?

They are absolutely good when you have to make sure the mutex is unlocked before doing something. This is because pthread_mutex_unlock can guarantee that the mutex is unlocked only if it is non-recursive.

pthread_mutex_t      g_mutex;

void foo()
{
    pthread_mutex_lock(&g_mutex);
    // Do something.
    pthread_mutex_unlock(&g_mutex);

    bar();
}

If g_mutex is non-recursive, the code above is guaranteed to call bar() with the mutex unlocked.

Thus eliminating the possibility of a deadlock in case bar() happens to be an unknown external function which may well do something that may result in another thread trying to acquire the same mutex. Such scenarios are not uncommon in applications built on thread pools, and in distributed applications, where an interprocess call may spawn a new thread without the client programmer even realising that. In all such scenarios it's best to invoke the said external functions only after the lock is released.

If g_mutex was recursive, there would be simply no way to make sure it is unlocked before making a call.

Igor G
  • 1,838
  • 6
  • 18
  • This is not really a healthy approach. Example: `class foo { ensureContains(item); hasItem(item); addItem(); }` If `ensureContains()` uses `hasItem()` and `addItem()`, your unlock before calling someone else might prevent an auto-deadlock but also it prevents it from being correct in the presence of multiple threads. It kind of is as if you did not lock at all. – BitTickler Oct 04 '20 at 19:22
  • @BitTickler, Of course! No doubt, there are scenarios where the mutex must stay locked while calling some other method, and your example is one of them. However if, for whatever reason, the mutex **must** be unlocked before the call, then non-recursive mutexes are the only way to go. Which, in fact, was the main idea of this answer. – Igor G Oct 05 '20 at 09:30