5

I have the following code running for N threads with count=0 initially as shared variable. Every variable is initialised before the working of the threads. I am trying to execute the critical section of code only for MAX number of threads.

void *tmain(){
while(1){
    pthread_mutex_lock(&count_mutex);
    count++;
    if(count>MAX){
        pthread_cond_wait(&count_threshold_cv, &count_mutex);
    }   
    pthread_mutex_unlock(&count_mutex);
    /*
     some code not associated with count_mutex or count_threshold_cv
    */
    pthread_mutex_lock(&count_mutex);
    count--;
    pthread_cond_signal(&count_threshold_cv);
    pthread_mutex_unlock(&count_mutex);
}
}

But after running for some time the threads get blocked at pthread_cond_signal(). I am unable to understand why this is occuring. Any help is appreciated.

alk
  • 69,737
  • 10
  • 105
  • 255
arpp
  • 325
  • 1
  • 3
  • 10
  • 1
    Calls to `pthread_cond_signal` cannot deadlock in a healthy program. You should verify that the condition variable you are operating on is still alive and did not get corrupted. Also, use a debugger to verify where exactly the different threads are blocked at the time of the deadlock and what locks they are holding onto at that point. – ComicSansMS Apr 11 '14 at 11:09
  • The code shown looks ok. If it doesn't work the issue lies elsewhere. Btw: If `tmain()` is passed to `pthread_create()` as thread-function it shall be declared: `void * tmain(void *)`. – alk Apr 11 '14 at 12:11
  • I added the [C] tag. If you are doing [C++] please correct this. – alk Apr 11 '14 at 12:14

2 Answers2

4

This code has one weak point that may lead to a blocking problem. More precisely, it is not protected against so called spurious wakes up, meaning that the pthread_cond_wait() function may return when no signal were delivered explicitly by calling either pthread_cond_signal() or pthread_cond_broadcast().
Therefore, the following lines from the code do not guarantee that the thread wakes up when the count variable is less or equal MAX

if(count>MAX){
    pthread_cond_wait(&count_threshold_cv, &count_mutex);
}

Let's see what may happen when one thread wakes up when the count still greater than MAX: immediately after that the mutex is unlocked. Now other thread can enter the critical session and increment the count variable more than expected:

pthread_mutex_lock(&count_mutex);
count++;

How to protect code against spurious signals? The pthread_cond_wait wake up is a recommendation to check the predicate (count>MAX). If it is still false, we need to continue to wait on the conditional variable. Try to fix your code by changing the if statement to the while statement (and, as remarked by @alk, change the tmain() signature):

while(count>MAX)
{
    pthread_cond_wait(&count_threshold_cv, &count_mutex);
}

Now, if a spurious wake up occurs and the count still greater than MAX, the flow will wait on the conditional variable again. The flow will escape the waiting loop only when a wake up is accompanied by the predicate change.

MichaelGoren
  • 961
  • 9
  • 15
0

The reason your code blocks is because you place count++ before the wait:

count++;
if(count>MAX){
        pthread_cond_wait(&count_threshold_cv, &count_mutex);
}

Instead you should write

while (count >= MAX) {
        pthread_cond_wait(&count_threshold_cv, &count_mutex);
}
count++;

The reason is that count should be the number of working threads. A thread must only increment count when it is done waiting.

Your count variable, on the other hand, counts the number of working threads plus the number of waiting threads. This count is too large and leads to the condition count > MAX being true which blocks.

You should also replace "if" with "while" as MichaelGoren writes. Using "if" instead of "while" does not lead to blocking, but rather to too many threads running simultaneously; the woken thread starts working even if count > MAX.

The reason that you need "while" is that pthread_cond_signal unblocks one of the waiting threads. The unblocked thread, however, is still waiting for the mutex, and it is not necessarily scheduled to run either. When the awoken thread finally acquires the mutex and starts running, the call to pthread_cond_wait returns. In the mean time, between pthread_cond_signal and return of pthread_cond_wait, other threads could have owned the mutex. So you must check the condition again, which is what "while" does.

Also, because count++ is now after wait, the condition becomes count >= MAX instead of count > MAX. You should wait even if the number of workers is MAX.

Alternatively, you could have used semaphores for this problem.

Morten Krogh
  • 154
  • 4