How to reason the correctness of mutex composition?

Question

I'm trying to implement semaphore with mutex to learn more about concurrent primitives and patterns, and how to write correct concurrent programs.

Here is the resource I found: http://webhome.csc.uvic.ca/~mcheng/460/notes/gensem.pdf

Solution #1 is labeled "incorrect" on page 9. I implemented the algorithm here. Running this program several times could result in frozen threads. I did some analysis and realized that the following sequence of operations on the d mutex could happen:

            // d was initialized as locked
d.unlock(); // d is unlocked
d.unlock(); // d is still unlocked (unaffected)
d.lock();   // d is locked
d.lock();   // d is still locked (deadlock)

This will cause the last d.lock() to deadlock.

Solution #2 addressed this issue by securing the transition from the signaling thread to the awaking thread by reusing mutex m. I implemented this version here.

In this solution, after d.unlock(), m remains locked, which means subsequent post()operations will be blocked until m is unlocked. Then d.lock() is called following m.unlock(), making sure d is in locked state before allowing subsequent post() operations to run.

Although I understand how this solution address the issue, I'm having difficulty to argue its correctness in terms of other potential issues. Is there any general rules and guidelines we can follow to ensure, argue, or even prove the correctness of programs like this?

I want to ask this question because of the next 2 solutions from the note. I implemented solution #3 and solution #4, but both are having frozen threads when tested several times. I'm not sure if it's my implementation's problem or if the solution itself is incorrect.

I'd be appreciated if you can just analyze these solutions' correctness, but I'd be more than ever like to learn a way to reason about any algorithms of this kind, and verify their correctness.

Beyond the scope of your questions about getting these exercises right, the challenges involved here can be a good motivation in a real codebase to start with coarse-grained locking and refine it later once you find that it's a bottleneck. Combining that with strategies like monitors (https://stackoverflow.com/questions/12647217/making-a-c-class-a-monitor-in-the-concurrent-sense) that mate your concurrency control with your domain decomposition can make getting this type of code right much, much easier. — Aaron Altman, Jun 15 '19 at 14:51

Eric · Answer 1 · 2019-06-15T21:19:10.203

Running your code a bunch of times and hoping for some arbitrary interleaving to uncover a concurrency-related bug is not a great way to do things but it's useful for some initial testing. Otherwise, there are better approaches.

If you're working with C or C++, you can check out ThreadSanitizer which is available in GCC and Clang and will help you detect deadlocks and other concurrency-related bugs at runtime.

The more exhaustive approach is systematic concurrency testing that will enumerate the possible interleavings and execute them. There are different techniques such as stateless and stateful model checking that you should check out.

Spin is a model checking tool that can be used for formal verification. You write your models in Promela and use LTL to make assertions.

Finally, here's Wikipedia's list of model checkers for various languages.

score 1 · Answer 2 · answered Jun 15 '19 at 01:29

1

Documentation on lockdep (example: https://lwn.net/Articles/705379/) is one of many sources that describes the general criterion you're looking for: you want an implementation with no cycles in the dependency graph.

In your example where d locks twice, you have a dependency graph with a cycle consisting of just one node, d, with an edge going back to itself.

One style of multithreaded coding that might help in your solutions 3 and 4 would be to always release locks in the opposite order that you acquired them. It might be helpful for the overall understanding of the problem space to work it out on paper why imposing this requirement on your code would be at least as strict as saying that your implementation needs to have no cycles in its dependency graph.

And for a bonus question: are there implementations where the order of locks and releases are not a mirror image but where there are still no cycles?

answered Jun 15 '19 at 01:29

Aaron Altman

1,705
1
14
22

I'm trying to understand some concepts from the article. In my understanding, a "lock class" refers to an instance of lock variable, which can be locked and unlocked multiple times in different contexts. An "instance of lock" refers to a single call statement to acquire a lock variable, while it can be executed multiple times, even simultaneously in different contexts. Am I getting this right? – Rix Jun 16 '19 at 05:32
"Does a dependency 'AX -> B' exist? Nope." <- I'm confused of this example and this statement. In this example, there could be a case, when acquiring B, there exists waiters for both AX and B, and only way to wake up each of them is to release what it waits for. And whether the waiter for AX can be woken up depends on whether B can. Doesn't it satisfy the definition of a dependency? – Rix Jun 16 '19 at 05:35
"In your example where d locks twice, you have a dependency graph with a cycle consisting of just one node, d, with an edge going back to itself." <- I don't get the reason why locking d twice can create a dependency according to the definition of dependency from the article. In general I'm still confused of how to derive a dependency graph from a given program. Any illustration to clarify this is appreciated. – Rix Jun 16 '19 at 05:38
I can try to respond in more detail but it looks like the issue at the top of the mental stack here is that you're probably dealing with non-recursive mutexes (can't lock multiple times or else by definition you get a deadlock) with recursive (can lock multiple times within the same thread). This post: https://stackoverflow.com/questions/187761/recursive-lock-mutex-vs-non-recursive-lock-mutex provides some rationale for the different uses. Check it out and let me know if it clarifies anything or not. – Aaron Altman Jun 16 '19 at 18:22

How to reason the correctness of mutex composition?

2 Answers2