Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
- it could update a shared object that tracks information about your threads
- it could write to a pipe designated for the purpose
- it could raise a signal
If you like, you can use pthread_cleanup_push()
to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join()
to check thread state are
- it doesn't work for daemon threads, and
pthread_join()
blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np()
, which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np()
, which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc
filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc
is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc
. I mention it only for completeness.
Overall
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join()
. If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.