pthreads: exclusive+concurrent functions (inverse semaphore)

Question

I have code that locks every function (of a certain library), and which I would like to optimize. Given functions A and B, I don't mind any A running concurrently with any other A, or any B running concurrently with any other B, but no A can run while any B is running, and vice-versa. The thread count is dynamic and for reason out of my control I am forced to use static allocation for mutexes and conditional variables (i.e. PTHREAD_MUTEX_INITIALIZER).

I have a hunch that the most efficient approach is two conditional variables. Using pthread_mutex_trylock allows one function (i.e. A) to run in parallel while the other must be serialized. Also *trylock with static initialization is actually undefined behavior...

Edit:

Perhaps something like this? I'm not sure if this:

Can be simpler. After all, mutexes are implemented using semaphores, but it takes four mutexes and two conditional variables to implement what is basically an inverse semaphore.
Covers all race conditions.
Is "fair" (beyond default priority and scheduling)?

static int countA = 0;
static int countB = 0;
static pthread_mutex_t lockCountA = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t lockCountB = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t lockA = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t lockB = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t condA = PTHREAD_COND_INITIALIZER;
static pthread_cond_t condB = PTHREAD_COND_INITIALIZER;

// for B(), just s/B/A/g
static void A(void) {
    pthread_mutex_lock(&lockB);
    while(countB)
        pthread_cond_wait(&condB, &lockB);
    pthread_mutex_lock(&lockCountA);
    countA += 1;
    pthread_mutex_unlock(&lockCountA)
    doA();
    pthread_mutex_lock(&lockCountA)
    countA -= 1;
    if countA == 0:
        pthread_cond_signal(&condA);
    mutex_unlock(&lockCountA)
    mutex_unlock(&lockB);
}

Given a waiting `A` and many `B`, I would like further `B` to block, if that's what you mean. *edit*: well actually the serialized version (single mutex) works just fine, so probably not...? — user19087, Jun 21 '18 at 19:36

caf · Accepted Answer · 2018-06-22T04:22:24.750

Your solution has a race condition. Consider the case when both countA and countB are zero, and two threads simultaneously call A() and B(). The first thread locks lockB, the second thread locks lockA, both see the count they examine as zero, then proceed to increment their respective counts and proceed in error.

Another problem in your solution is that it uses pthread_cond_signal() which does not necessarily wake any more than one waiting thread, so if you have 10 threads waiting to enter B() but only one thread running A(), when the latter thread finishes only one B() thread is guaranteed to continue and the other 9 may wait indefinitely.

It also doesn't allow more than one thread to run doA() concurrently, since lockB is held over that call.

To fix the first problem, you can use one mutex that protects both countA and countB (because the shared state we must examine involves the combination of both these variables). At the same time, you might as well just use one condition variable too: the threads waiting on the condition variable must either be all waiting to enter A(), or all waiting to enter B(), but a mix of the two is impossible. Fixing the second problem just requires pthread_cond_broadcast(). That leads to the much simpler:

static int countA = 0;
static int countB = 0;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

void A(void)
{
    pthread_mutex_lock(&lock);
    while (countB)
        pthread_cond_wait(&cond, &lock);
    countA++;
    pthread_mutex_unlock(&lock);

    do_A();

    pthread_mutex_lock(&lock);
    countA--;
    if (!countA)
        pthread_cond_broadcast(&cond);
    pthread_mutex_unlock(&lock);
}

This solution is correct, but is not "fair" - if there is a continuous stream of threads executing A() (such that a new one enters and increments countA before the previous thread has finished and decremented it) then the threads waiting to execute B() will be kept waiting forever. This may not be an issue in your particular use - for example, if you know that any thread that executes A() will eventually execute B(), then the starvation situation must eventually resolve.

Improving the system to prevent this starvation can be done by preventing new entries into A() while there are threads queued to enter B():

static int countA = 0;
static int countB = 0;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
static int queuedA = 0;
static int queuedB = 0;
static pthread_cond_t queue_cond = PTHREAD_COND_INITIALIZER;

void A(void)
{
    pthread_mutex_lock(&lock);
    while (queuedB)
        pthread_cond_wait(&queue_cond, &lock);
    while (countB)
    {
        queuedA++;
        pthread_cond_wait(&cond, &lock);
        queuedA--;
    }
    if (!queuedA)
        pthread_cond_broadcast(&queue_cond);
    countA++;
    pthread_mutex_unlock(&lock);

    do_A();

    pthread_mutex_lock(&lock);
    countA--;
    if (!countA)
        pthread_cond_broadcast(&cond);
    pthread_mutex_unlock(&lock);
}

This prevents starvation because:

queuedB is always non-zero while there are any threads waiting on cond in B();
While queuedB is non-zero, no threads may increment countA, and therefore countA must eventually reach zero and allow the threads waiting on cond to proceed.
while countA is zero, no thread may increment queuedB and thus queuedB must eventually reach zero and allow the threads waiting on queue_cond to proceed.

That makes a lot of sense. In hindsight I was trying to use a mutex to lock out only a subset of the users of that mutex (i.e. holding `lockB` over `doA`); a clear impossibility, rather than depending on the condition variable. I also failed to use the mutex to protect the conditional state. A really practical example of [Why do pthreads’ condition variable functions require a mutex?](https://stackoverflow.com/questions/2763714/why-do-pthreads-condition-variable-functions-require-a-mutex) (which I had just happened to be reading). — user19087, Jun 22 '18 at 15:58
So two quick follow-on questions. Is it fair to say that any implementation of a "reverse semaphore" would never require more than a mutex and condition variable, as a regular semaphor just provides a mutex (blocking) and a conditional variable (signaling state/resource-availability). For example, to add a third function `C` (i.e. an extra resource) that excludes `A` and `B`, I just have to modify the `while (countB)` to `while (countB || countC)`? — user19087, Jun 22 '18 at 15:58
@user19087: Sure. Note that it's *always* possible to convert an algorithm that uses two mutexes into one that uses just one, with only a possible performance impact, and the same is true of condition variables. — caf, Jun 24 '18 at 11:39

pthreads: exclusive+concurrent functions (inverse semaphore)

1 Answers1