Why does this mutex code not work as expected?

Question

There are a mess of postings and answers on this topic, but none seems to quite model my problem. After poking around googling and searching stackoverflow, I don't quite see the answer for my question.

I have two threads, a master and slave. The slave needs to wait for the master before proceeding past a certain point, so I've created a mutex as a global variable:

pthread_mutex_t lock;

And then in the initialization for the master thread, long before the slave thread can have a chance to access it, I lock it:

initialize in master thread:

pthread_mutex_lock(&lock)

Then in the slave, when its time to wait for the master I do this:

slave must wait here:

pthread_mutex_lock(&lock);
pthread_mutex_unlock(&lock);

Meanwhile, back in the master, I have this when its time to "release" the slave who is blocked waiting:

pthread_mutex_unlock(&lock);
pthread_mutex_lock(&lock);

(Note: the order of lock/unlock is opposite for the master.)

Based on my understanding of mutexes, I figured the slave would hit the lock, and get stuck there waiting for the master who would unlock it, then immediately lock it again. In terms of timing, the slave will take a LONG time (guaranteed) to get back here again, so the master will have lots of time to relock it. Similarly, the master won't get back here again for a while and we don't need to worry about either the master or the slave beating the other back to these checkpoints.

When it did not work as expected, I tossed in some printf's to confirm that the master unlocks, then re-locks the mutex before the slave can unlock it. My understanding is that the slave has requested the lock long before the master got there to do the unlock and (re)lock, and no matter how small the time is between the master's unlock and (re) lock, the slave should still be able to lock the mutex because he's already been "in line" waiting.

However, what I see happening is that the master unlocks the mutex, and then immediately re-locks it even though the slave has been patiently blocked waiting for his chance to lock it.

Here is the code with the printf's in it and the output generated:

slave:

printf("slave thread starting lock sequence\n");fflush(stdout);
pthread_mutex_lock(&lock);
printf("slave thread intra lock sequence\n");fflush(stdout);
pthread_mutex_unlock(&lock);
printf("slave thread completed lock sequence\n");fflush(stdout);

master:

printf("master thread starting lock sequence\n");fflush(stdout);
pthread_mutex_unlock(&lock);
printf("master thread intra lock sequence\n");fflush(stdout);
pthread_mutex_lock(&lock);
printf("master thread completed lock sequence\n");fflush(stdout);

Now here's what I see as output:

slave thread starting lock sequence

... then some time goes by (several seconds) while the slave is blocked, and finally this appears:

master thread starting lock sequence

master thread intra lock sequence

master thread completed lock sequence

In the meantime, there is no further progress of the slave, who remains blocked forever. I would have expected him to prevent the master from re-locking and should have spit out his printf indicating that he'd moved forward. This output clearly indicates that the blocked slave didn't get his chance to lock the mutex even though he was patiently waiting in line for his turn.

So what is it that I am missing about mutexes and lock/unlock?

-gt-

Duplicate of [Issue with Lock Ordering or Scheduling](https://stackoverflow.com/questions/14736751/issue-with-lock-ordering-or-scheduling). Pthreads do not guarantee fairness. — Raymond Chen, Mar 05 '19 at 02:15
"So what is it that I am missing about mutexes and lock/unlock?" From a quick skim it sounds like you are trying to predict behavior but maybe making assumptions about how the scheduler works. What OS, which scheduler? — Brian Cain, Mar 05 '19 at 02:15
I think what I'm missing is the "lack of guarantee of fairness" as Raymond pointed out. So its not "first come, first served" as I assumed. In other words, just because the slave has already been waiting in line to lock the mutex, it doesn't prevent the master from locking it again before the slave has his chance. I believe THAT's what I was missing. Raymond, does that sound right? — George Townsend, Mar 05 '19 at 02:25
For the kind of thing you want to achieve, signalling via a condition variable is the classical pthreads approach. Just read the manpage for pthread_cond_signal(). On my system, a Debian derivative, the pthreads manpages are available by doing 'sudo apt install glibc-doc'. — Erik Alapää, Mar 05 '19 at 08:22

score 0 · Answer 1 · answered Mar 05 '19 at 23:01

As noted in the comments to your question, pthreads mutexes do not guarantee fairness.

The correct tool for this job is a shared flag variable, protected by a mutex and waited on using a condition variable:

pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
int flag = 0;

The slave waits for the flag:

pthread_mutex_lock(&lock);
while (!flag)
    pthread_cond_wait(&cond, &lock);
pthread_mutex_unlock(&lock);

When the master wants to release the slave, it sets the flag and signals the condition variable:

pthread_mutex_lock(&lock);
flag = 1;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&lock);

(Note that the mutex is not held while blocked inside pthread_cond_wait()).

score 0 · Answer 2 · answered Mar 05 '19 at 23:49

When it did not work as expected, I tossed in some printf's to confirm that the master unlocks, then re-locks the mutex before the slave can unlock it. My understanding is that the slave has requested the lock long before the master got there to do the unlock and (re)lock, and no matter how small the time is between the master's unlock and (re) lock, the slave should still be able to lock the mutex because he's already been "in line" waiting.

There's no reason to be fair to threads, they don't file union grievances if you mistreat them. But there is a reason to make the system as a whole do as much work as it can as quickly as possible. Stopping one thread to start another has a significant negative impact on performance as the new thread starts with all the caches cold and the same thing happens when you switch back.

It's your job to ensure that your code, when it executes, does the work you want it to do. The scheduler will try to get work done as quickly as possible with only the occasional nod to fairness.

KL-Yang · Answer 3 · 2019-03-06T01:51:04.657

Maybe you can model the problem after semaphore, I found this is usually easier to understand and implement. something like following pseudo code example.

//before create slave thread, create semaphore first
sem_t data_ready, slave_done;
sem_init(&data_ready, 0);
sem_init(&slave_done, 0);

In master thread:

//master do something AAA
sem_post(data_ready);
//master do something BBB
sem_wait(slave_done);  //master may sleep here, and maybe interrupted by signal, need to handle that
//master do something CCC

In slave thread:

//slave do something DDD
sem_wait(&data_ready); //slave may sleep here, and maybe interrupted by signal, need to handle that
//slave do something EEE
sem_post(&slave_done);
//slave do something FFF

You may find execution order should be AAA [BBB/DDD] EEE [FFF/CCC], it ensure master->AAA is before slave->EEE, which runs before master->CCC.

Why does this mutex code not work as expected?

3 Answers3