4

I am working on a python program which runs as a daemon and spawns several different long running threads with potentially separate sleep timers.

The issue I am running into is that the threads are dying after an unknown amount of time and I am not entirely sure why or how to diagnose the issues. I went and added (though not the final solution) a __del__ function to the class run as a thread to see what might be the issue, but am not sure what variables are available to determine what is causing the exit to occur.

I am not closer to determining the cause of the issue and am hoping to find some assistance.

A snippet of my main running program which is the top level daemon process is:

threads = []
sensorFolders = glob.glob(config._baseDir + '28*')
for folder in sensorFolders:
    sensorID = os.path.split(folder)[1]
    sensor = Sensor().getSensor(sensorID)
    threads.append(threading.Thread(target=sensor.startCheckin))
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

And the piece of the Sensor Class:

def startCheckin(self):
    while True:
        self.checkSensor()
        self.checkinSensor()
        self.postTemp()
        time.sleep(self._checkinInterval)

I can certainly add more code as needed but it is fairly basic in its implementation. I am just not sure what to try here as there does not appear (to a python noob) to be anything glaringly obvious as to what might be causing the abrupt closure of the threads.

Any help would be greatly appreciated!

Edit The issue appears to be that if the network drops for a moment, and the thread calls a url request, it does not know where to find the host and throws an exception. Unfortunately knowing this I am still unsure of how best to handle dealing with these exceptions.

Brian
  • 2,294
  • 6
  • 30
  • 51
  • 3
    Probably an unhandled exception. Add some logging. – roippi Aug 12 '14 at 03:11
  • Within the `__del__` function can I print a stack trace? – Brian Aug 12 '14 at 03:15
  • 1
    Yep. `import traceback`. https://docs.python.org/2/library/traceback.html. You could also wrap your startCheckin code in a try/catch (rather than __del__) – fileoffset Aug 12 '14 at 03:21
  • Alright I will try that and see what happens, takes awhile for it to die off... – Brian Aug 12 '14 at 03:23
  • You're not getting any stacktraces printed by anything? I'd expect an unhandled exception to make some noise. Are there `try/except` blocks in this code anywhere? Inside `checkSensor()`, `checkinSensor()`, etc. – Patrick Collins Aug 12 '14 at 03:25
  • How are you running it as a daemon? Are you using the `daemon` library? If not, which of the standard things described in [PEP 3143](http://legacy.python.org/dev/peps/pep-3143/) are you doing? If you daemonize properly, but do it even when run directly, then you wouldn't get any tracebacks anywhere when run directly. – abarnert Aug 12 '14 at 03:29
  • Also, if you don't have any other way to debug this, why not write to a log file between each of the four lines of the `startCheckin` loop? Then you'll at least know which function is failing. – abarnert Aug 12 '14 at 03:30
  • @abarnert I don't know if I got the daemon class from here but it looks about the same on a quick glance over: https://github.com/mitotic/graphterm/blob/master/graphterm/daemon.py and logging out after each individual function should at least help narrow the search if a full stacktrace doesn't come out the of try catch I tossed around the startCheckin() method. – Brian Aug 12 '14 at 03:37
  • @PatrickCollins I do have other try catch blocks within those functions just not all of them yet. – Brian Aug 12 '14 at 03:39
  • @Brian No, I mean the try/except block is bad because it might be suppressing an exception that would let you diagnose the problem. Please post any try/except blocks. – Patrick Collins Aug 12 '14 at 03:40
  • @PatrickCollins Let me see if I can get some debugging output tonight otherwise I probably just have to post the entire code up. – Brian Aug 12 '14 at 03:45
  • @Brian Sure. My bet is that somewhere there's an `except` that's taking silently taking down threads. – Patrick Collins Aug 12 '14 at 03:47
  • @PatrickCollins well you were right for some reason I am getting a `HTTPConnectionPool(host='temperatures.localhost', port=80): Max retries exceeded with url: /api/sensors/2 (Caused by : [Errno -2] Name or service not known) ` exception being thrown. – Brian Aug 12 '14 at 23:12
  • Would it be advantageous to detect whether the thread has died or needs to die, and recreate it? – Brian Aug 12 '14 at 23:25
  • I'm not entirely sure about this, but do the last 4 lines (with for loops that start and join the threads) need to be inside the folders for loop? maybe these 4 lines have an excessive indent? Because I tried your example, and it creates only one thread and presumably waits for it to finish before creating the next one. After I removed those indents, it created several threads that run separately. I'm just not sure if it is what you require. – Highstaker Aug 13 '14 at 12:22
  • @Highstaker Yes the indentations were incorrect, the issue is not that it is not threading properly, the issue is the exception listed above killing off the thread. I am working on narrowing down the issue exactly and how to best recover from it. Any thoughts would be appreciated! – Brian Aug 13 '14 at 16:41
  • please post runnable code. And, it's unlikely the hostname is 'temperatures.localhost', it's probably 'localhost' -- maybe the URL has too many or too few slashes? – johntellsall Aug 18 '14 at 20:22

1 Answers1

1

So I only see 3 possibilities here:

  1. The thread is throwing an exception, and you are not looking at or otherwise not noticing stderr
  2. Something being called by the thread is calling sys.exit, this will force only that thread to stop.
  3. If any blocking operations or locks are used its possible that the thread is deadlocking itself or blocking indefinitely on some io operation.

in any of these cases adding some thread dumping locking like this:

https://stackoverflow.com/a/2569696/3957645

should show you what is going on on that thread (or if its gone).

Community
  • 1
  • 1