14

I have a simple threaded Python program following the standard paradigm:

class SearchThread(threading.Thread):
    def __init__(self, search_queue):
        threading.Thread.__init__(self)
        self.search_queue = search_queue

    def run(self):
        while True:
            try:
                search_url = self.search_queue.get(timeout=15)
                # <do Internet search and print output/>
            except Queue.Empty:
                self.search_queue.task_done()
                break
            except Exception, e:
                print e

if __name__ == '__main__':
    search_queue = Queue.Queue()    
    for i in range(200):
        t = SearchThread(search_queue)
        t.setDaemon(True)
        t.start()
    search_queue.join()

The queue is filled with about 1000 urls and simple HTTP GET is performed in <do Internet search and print output/>. The problem is that after processing some 500-700 entries (which takes only seconds), the program consistently hangs forever with no output, no exception, nothing.

I've tried requests, urllib2, urllib3, httplib2 for the HTTP GET but nothing changes.

How do you debug hanging threaded Python program?

BTW, I'm using Python 2.7 under Ubuntu 11.10 (64bit).

edit

I'm as clueless as before when staring at the gdb trace on the hang process --

sudo gdb python 9602
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
...
(gdb) where
#0  0x00007fc09ea91300 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000004ed001 in PyThread_acquire_lock ()
#2  0x00000000004f02de in ?? ()
#3  0x00000000004b6569 in PyEval_EvalFrameEx ()
#4  0x00000000004bcd2d in PyEval_EvalCodeEx ()
#5  0x00000000004b6a5b in PyEval_EvalFrameEx ()
#6  0x00000000004b6d77 in PyEval_EvalFrameEx ()
#7  0x00000000004bcd2d in PyEval_EvalCodeEx ()
#8  0x00000000004bd802 in PyEval_EvalCode ()
#9  0x00000000004dcc22 in ?? ()
#10 0x00000000004dd7e4 in PyRun_FileExFlags ()
#11 0x00000000004de2ee in PyRun_SimpleFileExFlags ()
#12 0x00000000004ee6dd in Py_Main ()
#13 0x00007fc09d86030d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x000000000041cb69 in _start ()
Jerry
  • 2,497
  • 4
  • 22
  • 31
  • 2
    You could insert logging-statements to determine in which line of your code the program hangs. – Björn Pollex Apr 04 '12 at 15:51
  • try following this [tutorial](http://wiki.python.org/moin/DebuggingWithGdb) – soulcheck Apr 04 '12 at 15:59
  • I had a similar issue here: https://stackoverflow.com/questions/28223414/gevent-requests-hangs-while-making-lots-of-head-requests – vgoklani Mar 01 '15 at 05:22
  • In the general case refer to [debugging - Showing the stack trace from a running Python application - Stack Overflow](https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application). If the reader of this comment followed the guide and still cannot debug the issue, post a specific question with a [example]. – user202729 Aug 14 '21 at 11:42

6 Answers6

17

I wrote a module that prints out threads that hang longer that 10 seconds at one place. hanging_threads.py (package)

Here is an example output:

--------------------    Thread 5588     --------------------
  File "C:\python33\lib\threading.py", line 844, in _exitfunc
        t.join()
  File "C:\python33\lib\threading.py", line 743, in join
        self._block.wait()
  File "C:\python33\lib\threading.py", line 184, in wait
        waiter.acquire()

This occurs at the exit of the main thread when you forget to set another thread as daemon.

User
  • 14,131
  • 2
  • 40
  • 59
  • 5
    That's nice! Note however that it will not work if the Python GIL is involved in the deadlock. In that case, once it hangs, no more Python code will be executed. To solve this, you need to implement the same in C and without the Python GIL. It gets tricky then: To do it safely, you need to halt all threads. One relatively easy way to do this is to issue a signal and do it in the signal handler. I did a lot of these things in [my MusicPlayer project](https://github.com/albertz/music-player), if you are interested in some code. – Albert Feb 22 '14 at 13:55
  • Remark: In the current version of `hanging_threads` package, the [thread name](https://docs.python.org/3/library/threading.html#threading.Thread.name) is set, it's also displayed in the header. – user202729 Aug 14 '21 at 09:34
2

seems like you are facing the same issue as mentioned in this thread.

python multiprocessing: some functions do not return when they are complete (queue material too big)

Crux is this is an unresolved/closed bug? http://bugs.python.org/issue8237

Community
  • 1
  • 1
maneet
  • 295
  • 2
  • 10
2

I'm not sure if you still have the problem (question is somewhat old...).

It looks like a classical deadlock (because it seems to hang at some mutex lock).

For GDB, there exists a few nice Python scripts which can make the C backtrace with Python calls more informative. I.e. it shows the actual Python calls for this:

#3  0x00000000004b6569 in PyEval_EvalFrameEx ()
#4  0x00000000004bcd2d in PyEval_EvalCodeEx ()
#5  0x00000000004b6a5b in PyEval_EvalFrameEx ()
#6  0x00000000004b6d77 in PyEval_EvalFrameEx ()
#7  0x00000000004bcd2d in PyEval_EvalCodeEx ()
#8  0x00000000004bd802 in PyEval_EvalCode ()

I think these GDB Python scripts are even included in the original Python distribution. Check them out.

Then, there is the great faulthandler module which offers you some function to print the Python backtrace (e.g. in a signal handler). In my MusicPlayer project, I have extended them a bit and I use them heavily for debugging.

For example, I have added this function:

// This is expected to be called only from signal handlers (or in an evironment where all threads are stopped).
__attribute__((visibility("default")))
void _Py_DumpTracebackAllThreads(void) {
    PyInterpreterState* interp = NULL;
    PyThreadState* tstate = NULL;

    // The current active Python thread (that might not be us).
    tstate = _PyThreadState_Current;

    // No Python state is currently active. Try to get our own, if we have one assigned.
    if(!tstate)
        tstate = PyGILState_GetThisThreadState();

    // No thread found so far. Try the interpreter head.
    if(!tstate)
        interp = PyInterpreterState_Head();

    if(!interp && tstate)
        interp = tstate->interp;

    if(!interp) {
        printf("_Py_DumpTracebackAllThreads: no Python interpreter found\n");
        return;
    }

    _Py_DumpTracebackThreads(STDOUT_FILENO, interp, tstate);
}

And now, when I'm in GDB or LLDB and want to know the current Python threads, I just type p _Py_DumpTracebackAllThreads() and it gets printed on stdout.

In addition to that, you are interested in the C backtrace of all current threads, i.e. t apply all bt full or so should print all the backtraces in GDB.

If that is the Python GIL where it hangs, there is probably some other active Python thread which hangs for some other thing. That is the actual bug. It should have released the Python GIL before that.

Albert
  • 65,406
  • 61
  • 242
  • 386
1

This debugger can debug multithreaded python programs: http://winpdb.org/

katzenversteher
  • 810
  • 6
  • 13
0

Your while loop is infinite. The thread will never finish the execution, even when the queue is empty.You should either check the queue for new tasks or notify a thread (using Event for example) that no tasks are expected.

Maksym Polshcha
  • 18,030
  • 8
  • 52
  • 77
0

Yet another thing is misuse of Queue.get. The 1st argument is a boolean value 'block'. You should type something like:

self.search_queue.get(timeout=15)

And, as I wrote above, avoid using infinite loops. When your timeout expires, Queue.get raises 'Empty' exception which is caught by "except Exception" (yet another construction that should be avoided to use). So your loop is really infinite. You cat change 'except Exception' for

except Queue.Empty:
    self.search_queue.task_done()
    break

Edit

Initial question code was like following

self.search_queue.get(15)
Maksym Polshcha
  • 18,030
  • 8
  • 52
  • 77
  • You are right on both points sir. I've made both changes, but the program still "hangs". What else can I check before reverting to C level debugger? – Jerry Apr 05 '12 at 02:47
  • I ran your code on my side with some minor modifications: fill the queue and print an item after getting from the queue like below: ... search_url = self.search_queue.get(timeout=15) print search_url ... ... search_queue = Queue.Queue() map(search_queue.put, xrange(0,1000)) ... The code works well on my side. The most likely your 'hang' caused by some network problems (if your code does some HTTP requests): insufficient bandwidth, unreachable hosts, request-related code problems etc. – Maksym Polshcha Apr 05 '12 at 06:09
  • Hi Maksym, thank you for your help. I've done similar tests and got the same result -- it only happens when there's network activity involved, although the problem never manifests when run in single thread. So I'm still clueless as to exactly why or how to resolve it. – Jerry Apr 05 '12 at 09:30
  • @Jerry If you provide me with your network operations I could assist you. – Maksym Polshcha Apr 05 '12 at 09:33