I have some random issues sometimes to join pthread. I can just say that the thread is not stuck in a deadlock with a mutex when the join is failing. Most of the time the thread is idle (sleep syscall) when the timeout occurred on join.
My need is basic. A way to start/stop a thread from the main thread. So I don't need to put mutex in start/stop manager on pthread state variable. The thread is working as an infinite loop most of the time. All my thread are designed with the same skeleton. A start and stop function. The thread function definition. I have a global variable g_event_ctx to store the current status of the thread. running to know I need to cancel it. is_joinable to know if I need to join the thread. Moreover I have sleep/read/write syscall on all my thread function (cancel point !)
typedef struct pthread_context
{
pthread_t id; /*!< pthread_t to be able to stop the thread later */
int running; /*!< allow to know if the thread is currently running */
int is_joinable; /*!< allow to know if the thread is joinable */
} str_pthread_context;
The code of the skeleton :
int start_x_manager (void)
{
pthread_t t_x;
if (g_event_ctx.x_thread.is_joinable) return 0;
PRINT_INFO ("Start x manager");
// start push x thread
if (pthread_create (&t_x, NULL, x_loop_thread, NULL))
PRINT_ERR_GOTO ("error on pthread_create for x thread");
pthread_setname_np(t_x, "x");
g_event_ctx.x_thread.id = t_x;
g_event_ctx.x_thread.is_joinable = 1;
g_event_ctx.x_thread.running = 1;
return 0;
error:
g_event_ctx.x_thread.running = 0;
g_event_ctx.x_thread.is_joinable = 0;
return 1;
}
int stop_x_manager (void)
{
struct timespec ts;
if (!g_event_ctx.x_thread.is_joinable) return 0;
PRINT_INFO ("Stop x manager");
if (g_event_ctx.x_thread.running)
{
CHECK_ERR_GOTO (pthread_cancel(g_event_ctx.x_thread.id) != 0, "Cannot cancel x thread");
g_event_ctx.x_thread.running = 0;
}
CHECK_ERR_GOTO (clock_gettime(CLOCK_REALTIME, &ts) == -1, "Cannot get clock time");
ts.tv_sec += 5;
CHECK_ERR_GOTO (pthread_timedjoin_np (g_event_ctx.x_thread.id, NULL, &ts) != 0, "Cannot join x_thread");
g_event_ctx.x_thread.is_joinable = 0;
return 0;
error:
g_event_ctx.x_thread.running = 0;
g_event_ctx.x_thread.is_joinable = 0;
return 1;
}
The skeleton of the thread function :
void *x_loop_thread (void *arg __attribute__((__unused__)))
{
CHECK_ERR_GOTO (pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) != 0, "Cannot set cancel state");
CHECK_ERR_GOTO (pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL) != 0, "Cannot set cancel state");
PRINT_INFO ("Start x manager loop thread ...");
pthread_cleanup_push(x_manager_cleanup, some_stuff);
while (1)
{
// Do some stuff here
}
g_event_ctx.x_thread.running = 0;
pthread_exit (NULL);
error:
g_event_ctx.x_thread.running = 0;
pthread_cleanup_pop(1);
pthread_exit (NULL);
}
CHECK_ERR_GOTO is a macro which check a condition to know if I need to jump to label error.
What is the reason which can explain a timeout on the pthread_timedjoin_np ? Another piece of code which corrupted my thread id ? Is there a problem of design in my skeleton ?