Why does boost's interprocess_condition deadlock in notify_one?

Question

It contains five parts:

SharedEvent is an implementation of an AutoResetEvent that is stored in shared memory.
CreatedSharedEvent creates a named shared memory object in which a SharedEvent is allocated. It provides an accessor method that returns a reference to the SharedEvent.
OpenedSharedEvent opens a named shared memory object in which a SharedEvent has already been allocated. It also provides an accessor method that returns a reference to the SharedEvent.
A server console application that creates a SharedEvent using a CreatedShareEvent and sets the event every 2 seconds. It prints a message every time the event is set.
A console application that opens the shared event using an OpenedShareEvent and waits on the event in a loop. It prints a message every time the wait call returns.

To reproduce the problem:

Run the server. Observe the messages printed every 2 seconds.
Run the client. Observe the messages printed every 2 seconds.
Close the client. Observe that the server ceases to print messages. It is blocked in interprocess_condition::notify_one()

Again [you come to us with a lot of prose, and no code](https://stackoverflow.com/questions/44982935/can-boosts-interprocess-segment-manager-allocators-be-themselves-shared-with-ot/44987240#comment76944259_44982935). "What could be causing the deadlock" - a number of things. We're more likely to see which if we can [_see your code_](https://stackoverflow.com/questions/44982935/can-boosts-interprocess-segment-manager-allocators-be-themselves-shared-with-ot/44987240#comment76954354_44987240). — sehe, Jul 09 '17 at 17:18
This time I'll wait for you to make an [SSCCE](http://sscce.org/) or [MCVE](https://stackoverflow.com/help/mcve). The links tell you how. My previous answer _showed you_ how. — sehe, Jul 09 '17 at 17:21
From my investigation so far, the culprit may not actually be the call to notify_one. True notify_one tries to take the lock, but the deadlock seems to be caused by the fact that the process that held the lock was closed and the lock was not released. If I am right, then this problem will occur whenever code in the remaining process tries to acquire the lock. So I am considering rephrasing the question. Is it possible to make release of boost::interprocess::interprocess_mutex automatic when a process exists? If not does that mean I must use another mutex class for synchronization? — David Sackstein, Jul 11 '17 at 21:25

score 0 · Answer 1 · answered Jul 11 '17 at 21:46

0

The cause of the problem is the same as described here:

This use of boost interprocess cannot be used in a situation where a process might crash and still be holding the lock.

I will post a different question to see if anyone has discovered a good replacement for boosts condition_variable and interprocess_mutex.

answered Jul 11 '17 at 21:46

David Sackstein

1

I've reached the same conclusion. I've been stress testing this thing ever since you posted the SSCCE. I can't make it fail unless I forcefully terminate clients. I have run the server for hours with a message each 1ms and 52 simultaneous clients that voluntarily shut down after 50..250 received messages. Only when I start occasionally killing clients mid-execution (they respawn) I see the server lock up. Otherwise, the server keeps running for hours and millions of messages sent (e.g. 446 minutes and 25,608,540 messages sent) – sehe Jul 11 '17 at 23:34
I'm currently running the same stress test using alternative sync primitives. [That's on linux for now.] – sehe Jul 11 '17 at 23:35
1

As probably expected, using `named_mutex` and `named_condition` does not alleviate the problem. I made it lock-up in 12 minutes, at which point indeed the server was hung in `notify_one`. The code in case you're interested: https://gist.github.com/sehe/812b11ab5b9bc64804296d29f8d6e20a – sehe Jul 12 '17 at 00:35

1 Answers1