0

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system. The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state Any suggestion?

P

H2O
  • 153
  • 1
  • 1
  • 13
  • It's always better to fix the reason for the crash than to restart the program that crashes. Always. – stark Mar 17 '22 at 11:36
  • Yes, you're right but my question is different! In any case ..how can detect an anomalous stopping thread so to log the event? – H2O Mar 17 '22 at 12:02
  • What manner of "crash" are you experiencing that affects only a single thread? – John Bollinger Mar 17 '22 at 13:06
  • At moment I don't have any crash, my task is implement a thread recovey system. For example if my thread are all infinite-loop and the main application needs to know if one or more thread still running. In case of single threat I could use the thread_join() and in case of return from the join I assume a problem in the thread. But in case of several threads?? – H2O Mar 17 '22 at 14:34

1 Answers1

0

Thread "crashes"

How to detect if a linux thread is crashed

if (0) //...

That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.

On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,

  • it could update a shared object that tracks information about your threads
  • it could write to a pipe designated for the purpose
  • it could raise a signal

If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.

On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.

Thread cancellation

You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.

GNU-specific options

The main issues with using pthread_join() to check thread state are

  • it doesn't work for daemon threads, and
  • pthread_join() blocks until the specified thread terminates.

For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.

Linux-specific options

The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.

Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.

Overall

I'm looking for some system call or thread attribute to understand the state

The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.


*Do not use thread cancellation.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157