Can I use pthread_join() to check for terminated thread?

Question

I need to know if some thread already terminated (if it's not, I must wait for it).
If I call pthread_join() on terminated thread, it always returns success in my version of glibc. But documentation for pthread_join() says that it must return error with code ESRCH if thread already terminated.
If I call pthread_kill(thread_id, 0) it returns with error code ESRCH (as expected).
Inside glibc sources I see that inside pthread_join() there is simple checking for valid thread_id, but not real checking if thread exist. And inside pthread_kill() there is real checking (in some kernel's list). There is my test program:

#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void * thread_func(void *arg)
{
    printf("Hello! I`m thread_func!\nGood-bye!\n");
    return NULL;
}

int main(void)
{
    int res;
    pthread_t thread_id;

    printf("Hello from main()!\n");
    pthread_create(&thread_id, NULL, thread_func, NULL);
    printf("Waiting...\n");
    sleep(3);

    res = pthread_join(thread_id, NULL);
    printf("pthread_join() returned %d (%s)\n", res, strerror(res));

    res = pthread_kill(thread_id, 0);
    printf("pthread_kill() returned %d (%s)\n", res, strerror(res));

    return 0;
}

It's output:

    Hello!
    Waiting...
    Hello! I`m thread_func!
    Good-bye!
    pthread_join() returned 0 (Success)
    pthread_kill() returned 3 (No such process)

My question: is it safe to use pthread_join() to check for terminated threads or I must always use pthread_kill()?

paxdiablo · Answer 1 · 2018-10-20T01:20:20.687

When a thread exits, the code for it stops running but its "corpse" is left lying around, for the return code to be collected by the parent.⁽¹⁾

So, even though you think the thread has totally disappeared, that's not actually the case.

A call to pthread_join will examine said corpse for a return code so that the parent is notified as to how things turned out. After that's been collected, the thread can be truly laid to rest.⁽²⁾

That's why pthread_join() is returning a success code and pthread_kill is not - you're not allowed to kill a thread that's already dead, but you are allowed to join to one that's dead but still warm :-)

You may be better educated by trying the following code, which tries to join to the thread twice:

#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void * thread_func(void *arg) {
    printf("Hello! I`m thread_func!\nGood-bye!\n");
    return NULL;
}

int main(void) {
    int res;
    pthread_t thread_id;

    printf("Hello from main()!\n");
    pthread_create(&thread_id, NULL, thread_func, NULL);
    printf("Waiting...\n");
    sleep(3);

    res = pthread_join(thread_id, NULL);
    printf("pthread_join() returned %d (%s)\n", res, strerror(errno));

    res = pthread_join(thread_id, NULL);
    printf("pthread_join() returned %d (%s)\n", res, strerror(errno));

    return 0;
}

On my system, I see:

Hello from main()!
Waiting...
Hello! I`m thread_func!
Good-bye!
pthread_join() returned 0 (No error)
pthread_join() returned 3 (No error)

In other words, though the thread is dead, the first pthread_join() works.

⁽¹⁾ You can pthread_detach a thread so that its resources are immediately released on termination, if you so wish. That would be along the lines of:

pthread_create(&thread_id, NULL, thread_func, NULL);
pthread_detach(thread_id);

but I'm pretty certain then a join will fail in that case even if the thread is still alive.

To see if a thread is still running regardless of whether it's detached or not, you can just use:

if (pthread_kill(thread_id, 0) != 0)
    // Thread is gone.

⁽²⁾ Apologies for the morbid tones of this answer, I'm feeling a little dark today :-)

Thanks for your answer, but my `man 3 pthread_join` says :" Joining with a thread that has previously been joined results in undefined behavior.". — Zhenya4880, Mar 30 '15 at 07:18
@Zhenya4880, I'm not saying you should do that in production code, it was an illustration to indicate that the thread does not disappear completely until you harvest the return code. If you want to get an error from the first join, detach the thread when you create it - I'll add some more code. — paxdiablo, Mar 30 '15 at 07:48
This `if (pthread_kill(thread_id, 0) != 0) // Thread is gone.` is not always true. `pthread_kill()` may not return an error and the signal may be sent, but it might be that it doesn't kill the thread. See http://stackoverflow.com/questions/223644/what-is-an-uninterruptable-process for more information. — Piotr Chojnacki, Nov 02 '15 at 13:38
@Piotr, it *will* actually work. A signal of `0` is special, indicating that error checking should be done (including checking the thread exists), but no signal is actually *sent.* It's not meant to kill the thread, just indicate that the thread is there. As per the Open Group docs (http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_kill.html), `pthread_kill` will return `EINVAL` if the signal number is bad (it isn't in this case since `0` is valid) or `ESRCH` if the thread doesn't exist. No other possibilities can happen. Ergo, if the return code is non zero, the thread is gone. — paxdiablo, Nov 02 '15 at 13:52

score 1 · Answer 2 · answered Mar 30 '15 at 01:57

pthread_join end the thread's uses of resources. It returns 0 when the thread has come to an end point and is ready to be cleaned up. Threads do not 'go away' all by themselves by default.

Returning zero means:

1. the thread got cleaned up
2. the thread WAS still there waiting

So no, do not use pthread_kill, you have a major assumption that is wrong: threads, unless set to be non-joinable do not exit and cleanup stack and memory resources when the thread returns. In other words, return NULL in you example di NOT terminate the thread. pthread_join did.

So, yes, use pthread_join to wait for a thread to complete.

Can I use pthread_join() to check for terminated thread?

2 Answers2