3

I have an event-driven application, written in python. After a while (usually >1 week) it appears to just stop responding to events. When this happens, I just ctrl-C and re-run and all is well-again. However, it's kind of annoying that this keeps happening and I have no idea what's causing it. Is there a way I can run my application that when this occurs and the application is no longer accepting connections, I can drop into a debugger and see what it's doing and why it's not taking connections?

I've used pdb before, but the way I've used it (if condition: pdb.set_trace()) doesn't really apply here, because I have no idea what it's doing in the code when it fails. My ideal situation would be instead of Ctrl-C maybe I hit Ctrl-somethingelse and that causes it to stop and drop into the debugger. Is such a thing easily done?

Mala
  • 14,178
  • 25
  • 88
  • 119
  • 3
    You could have a signal handler set a flag, then do `if flag: pdb.set_trace()`. Whether this would work depends on where the freeze is. – Tom Hunt Feb 20 '15 at 20:46
  • How big is the application? How many lines of code we talking about? – Vor Feb 20 '15 at 21:04
  • Pretty small -- about 800 lines of my code, sitting on top of a 600 line websockets library. TomHunt: thanks, I will try that! – Mala Feb 20 '15 at 23:31

1 Answers1

3

Triggering pdb in your case is probably not simple. However, whenever I need to debug such hangs, I inspect a "snapshot" of tracebacks of all the threads in the process, using the dumpstacks() function.

You can either use a timer to call it periodically and print the output to a log file, and refer to it when you notice the hanging, or harness some RPC mechanism (e.g. signals) to trigger the function call in your process on demand. I usually do the latter, because the processes in my system already listen to such RPC requests (using rpyc).

Community
  • 1
  • 1
shx2
  • 61,779
  • 13
  • 130
  • 153