pthread_cond_timedwait() usage for cancelling lengthy task

Question

I have a situation where I would like to cancel a thread if it takes too much to complete. For this, I am using a second thread that waits for the first thread to finish, but not more than a number of seconds. The pthread_cond_timedwait() function seems to fit perfectly my usage scenario, however it doesn't seem to behave as I would've expected it to. More specifically, even though the pthread_cond_timedwait() function returns ETIMEDOUT, it does so only after the thread that it was supposed to cancel finishes, which defeats the whole purpose.

This is my test code:

    #include <unistd.h>
    #include <stdlib.h>
    #include <errno.h>
    #include <iostream>
    #include <cstring>

    #define WAIT_INTERVAL 5
    #define THREAD_SLEEP 10

    pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    pthread_cond_t condition = PTHREAD_COND_INITIALIZER;

    pthread_t t1;
    pthread_t t2;

    void* f1(void*);
    void* f2(void*);

    int main()
    {
        pthread_create(&t1, NULL, &f1, NULL);
        pthread_create(&t2, NULL, &f2, NULL);

        pthread_join(t1, NULL);
        pthread_join(t2, NULL);

        std::cout << "Thread(s) successfully finished" << std::endl << std::flush;

        exit(EXIT_SUCCESS);
    }

    void* f1(void*)
    {
        pthread_mutex_lock(&mutex);
        timespec ts = {0};
        clock_gettime(CLOCK_REALTIME, &ts);
        ts.tv_sec += WAIT_INTERVAL;
        std::cout << __FUNCTION__ << ": Waiting for at most " << WAIT_INTERVAL << " seconds starting now" << std::endl << std::flush;
        int waitResult = pthread_cond_timedwait(&condition, &mutex, &ts);
        if (waitResult == ETIMEDOUT)
        {
            std::cout << __FUNCTION__ << ": Timed out" << std::endl << std::flush;
            int cancelResult = pthread_cancel(t2);
            if (cancelResult)
            {
                std::cout << __FUNCTION__ << ": Could not cancel T2 : " << strerror(cancelResult) << std::endl << std::flush;
            }
            else
            {
                std::cout << __FUNCTION__ << ": Cancelled T2" << std::endl << std::flush;
            }
        }
        std::cout << __FUNCTION__ << ": Finished waiting with code " << waitResult << std::endl << std::flush;
        pthread_mutex_unlock(&mutex);
    }

    void* f2(void*)
    {
        pthread_mutex_lock(&mutex);
        std::cout << __FUNCTION__ << ": Started simulating lengthy operation for " << THREAD_SLEEP << " seconds" << std::endl << std::flush;
        sleep(THREAD_SLEEP);
        std::cout << __FUNCTION__ << ": Finished simulation, signaling the condition variable" << std::endl << std::flush;
        pthread_cond_signal(&condition);
        pthread_mutex_unlock(&mutex);
    }

The output I get from the above code is:

    f1: Waiting for at most 5 seconds starting now
    f2: Started simulating lengthy operation for 10 seconds
    f2: Finished simulation, signaling the condition variable
    f1: Timed out
    f1: Could not cancel T2 : No such process
    f1: Finished waiting with code 110
    Thread(s) successfully finished

Given that this is my first time with POSIX threads, I think I'm missing something which may be pretty obvious.

I have read numerous tutorials, articles and answers about this, but none covers my use case and none offered any hint.

Please note that, for brevity, I have removed some of the code that handled the predicate mentioned in the pthread_cond_timedwait manual, as that doesn't change anything in the behaviour.

I am using POSIX threads on a CentOS 6.5 machine. My development&test environment: 2.6.32-431.5.1.el6.centos.plus.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

Compilation command: g++ -o executable_binary -pthread -lrt source_code.cpp

Try locking the mutex in the second thread just before signalling the condition. The first thread cannot wake up while the mutex is locked. — n. m. could be an AI, Mar 26 '14 at 16:32

stefaanv · Accepted Answer · 2014-03-26T16:40:54.350

4

Edit: I first adviced against using pthread_cond_timedwait, but I think in this situation it is okay so the first thread doesn't wait longer than needed, although instead of checking the return value, I would check a 'finished' flag, which is set by the second thread and protected by the mutex.

The problem in your example is that the mutex is taken by the first thread and the mutex is released by the pthread_cond_timedwait() call. It is then taken by the second thread, thus blocking the first until the second thread releases the mutex at the end.

edited Mar 26 '14 at 16:40

answered Mar 26 '14 at 16:25

stefaanv

14,072
2
31
53

1

One more thing: `pthread_cond_timedwait()` can receive [spurious wakeups](http://en.wikipedia.org/wiki/Spurious_wakeup), so if the result _isn't_ `ETIMEDOUT`, the second thread may not have finished at all! (I think this is what @stefaanv was talking about when he mentioned a 'finished' flag) – T045T Mar 26 '14 at 18:24
@T045T: yes, this may be an opportunity to plug an answer by me about how to use condition variables: http://stackoverflow.com/a/5538447/104774 – stefaanv Mar 26 '14 at 20:05
The obvious point I was missing was that the mutex shouldn't have been locked before the lengthy task, as @stefaanv and n.m. corectly pointed out above. There're actually 2 approaches that work for my use case: either use pthread_cond_timedwait() and ensure the mutex is unlocked before running the lengthy task, or use GMassucci's suggestion below, which is as efficient, but not as elegant as using a condition variable. For both approaches, it's essential that the lengthy task is completely independent from the monitoring thread. – Fara Importanta Apr 01 '14 at 15:21

score 1 · Answer 2 · answered Mar 26 '14 at 16:21

you are setting up the two threads with

 pthread_create(&t1, NULL, &f1, NULL);
 pthread_create(&t2, NULL, &f2, NULL);

Instead of only joining them I would use thread t1 to cancel t2: in thread t1 add a line which reads pthread_cancel(t2) after your timer has elapsed.

This will send a message to t2 telling it to terminate. You can leave the two join statements in place and that will mean that t1 will patiently wait for t2 to complete its death-throes before carrying on :)

Let me know if you need more info :)

pthread_cond_timedwait() usage for cancelling lengthy task

2 Answers2