I have several heavy python functions performing various application critical tasks and management stuff running on my cluster.
Here's the thing: Noticing issues with Python threads getting stuck unpredictably.
I even have a service running 10+ threads, where a specific thread gets stuck while others are still actively running their jobs.
Most of these threads contain while True
functions.
What is a good way to write reliable threads in Python, with a mechanism to self-recover if stuck