Debugging techniques for shut-down problems in Python daemons

Question

I am doing some gnarly stuff with Python threads including daemons.

I am getting an intermittent error on some tests:

Exception in thread myconsumerthread (most likely raised during interpreter shutdown):

Note that there are no stack trace/exception details provided.

Scrutinising my own code hasn't helped, but I am at a bit of a loss about the next step in debugging. What debugging techniques can I use to find out more about what exception might be bringing down the runtime during shutdown?

Fine print:

Windows, CPython, 2.7.2 - Not reproduceable on Ubuntu.
The problem occurs about 3% of the time - so reproducable, just not reliably.
The code in myconsumerthread has a catch-all exception handler, which tries to write the name of the exception to sys.stderr. (Could sys already be shut-down?)
I suspect the problem is related to shutting down daemon threads very quickly; before they have completely initialised. Something in this area, but I have little evidence - certainly insufficient to be pointing at a Python bug.
Ha, I have discovered a new symptom that marks a turning point in my descent into insanity!
- If I import time in my test harness (not the live code), and never use it, the frequency drops to about 0.5%.
- If I import turtle in my test harness (I swear on my life, there are no turtle graphics in my code; I chose this as the most irrelevant library I could quickly find) the exception starts to be caught in a different thread, and it occurs in about a third of the runs.

Oooh: [This may be relevant to my interests.](http://stackoverflow.com/questions/8456395/threaded-importing-while-interpreter-shuts-down) — Oddthinking, Apr 19 '13 at 02:28
Scratch that last idea. My imports are completed prior to the thread being launched. — Oddthinking, Apr 19 '13 at 02:32
possibly related: http://bugs.python.org/issue4106 https://github.com/paramiko/paramiko/issues/17 http://stackoverflow.com/questions/1745232/solving-thread-cleanup-on-paramiko — Patashu, Apr 19 '13 at 05:05
@Patashu: #1: Points to multiprocessing; I am using threading. #2: Points to http://bugs.python.org/issue1856 which *might* be relevant, but isn't fixed until 3.2 :-( No way of checking if it is the same bug. #3: OP was using __del__ inappropriately. I don't ever use __del__. - these are all good leads, thanks. Unfortunately, I am still stuck with no way of debugging to find if I have the same problem. Migrating to 3.x just to lose this shut-down bug wasn't quite on my list of things to do this week. Sigh. — Oddthinking, Apr 19 '13 at 09:50
No. Only one. That thread spawns a lot of short-term threads *serially*, so there are, at most three threads at one time, but one of them is continually terminating and being respawned. — Oddthinking, Apr 20 '13 at 23:31

score 1 · Answer 1 · edited May 23 '17 at 12:04

I have encountered the same error on a few occasions. I'm trying to locate / generate an example that displays the exact message.

Until then, if my memory serves me well, these were the areas that I focused on.

Looking for ports, files, queues, etc... removed or closed outside the daemon threads.
Scrutinize blocking calls in the daemon threads. IE a Queue.get(block=True), pyserial.read() - with timeout=None

After digging a little more I see the same types of errors popping up relating to Queue's see comments here.

I find it odd that it doesn't display the trace back. You might try to comment out the catch-all except and let Python send it to std.error. Hopefully then you'll be able to see what's dying on you.

Update
I knew I have seen this issue before... Below you'll find an example that generates that error (many of them actually). Note that there is no other trace back message either... For sake of completeness after you see the error messages, uncomment the queue.get lines and comment out the time.sleeps. The errors should go away. After re-running this again, the errors do not appear... This is inline with what you have been seeing in the sporadic failure rates... You may need to run it a few times to see the errors.

I normally use time.sleep(x) to throttle threads if blocking IO such as get() and read() do not provide a timeout method OR there is no blocking call to be used (user interface refreshes for example).

That being said, I believe there to be a problem with a thread being shutdown when waiting on a time.sleep() call. I believe that this call is what has gotten me every time, but I do not know what actually causes it inside the sleep method. For all I know there are other blocking calls that display this same behavior.

import time
import Queue
from threading import Thread

SLAVE_CNT = 50
OWNER_CNT = 10
MASTER_CNT = 2

class ThreadHungry(object):
    def __init__(self):
        self.rx_queue = Queue.Queue()

    def start(self):
        print "Adding Masters..."
        for x in range(MASTER_CNT):
            self.owners = []
            print "Starting slave owners..."
            for y in range(OWNER_CNT):
                owner = Thread(target=self.__owner_action)
                owner.daemon = True
                owner.start()
                self.owners.append(owner)

    def __owner_action(self):
        self.slaves = []
        print "\tStarting slaves..."
        for x in range(SLAVE_CNT):
            slave = Thread(target=self.__slave_action)
            slave.daemon = True
            slave.start()
            self.slaves.append(slave)

        while(1):
            time.sleep(1)
            #self.rx_queue.get(block=True)

    def __slave_action(self):
        while(1):
            time.sleep(1)
            #self.rx_queue.get(block=True)


if __name__ == "__main__":
    c = ThreadHungry()
    c.start()

    # Stop the threads abruptly after 5 seconds
    time.sleep(5)

I'm trying to follow up on your suggestions. Commented out the catch-all, and it made no difference. Not using ports. Not using files (although stderr might count). There is a Queue right in the centre of my code, so that might be helpful. — Oddthinking, Apr 19 '13 at 05:01
@Oddthinking You might consider using the Python debugger in your exception handlers ( `import pdb; pdb.set_trace()` ) to poke around. Tho I have my doubts it will work since it's dying on a shutdown event. Still might be worth a shot. — Adam Lewis, Apr 19 '13 at 05:20
ran it 100 times manually in the debugger. Didn't see the problem. Got bored, and moved on. — Oddthinking, Apr 19 '13 at 09:42
I should mention that I followed up the Queue issue - it is a problem with multiprocessing.Queue that was fixed in the multithreading module 2.7.4. However, I am using (multithreading) Queue.Queue, so it appears unrelated. — Oddthinking, Apr 20 '13 at 01:17

Debugging techniques for shut-down problems in Python daemons

1 Answers1