So I'm debugging my python program and have encountered a bug that makes the program hang, as if in an infinite loop. Now, I had a problem with an infinite loop before, but when it hung up I could kill the program and python spat out a helpful exception that told me where the program terminated when I sent it the kill command. Now, however, when the program hangs up and I ctrl-c it, it does not abort but continues running. Is there any tool I can use to locate the hang up? I'm new to profiling but from what I know a profiler can only provide you with information about a program that has successfully completed. Or can you use a profiler to debug such hang ups?
-
2How do you know it's in a loop? Is one CPU pegged at 100%? If not, it could be in a socket wait (assuming it's doing network I/O). – Jim Garrison Aug 09 '10 at 19:55
-
6If it doesn't respond to Ctrl+C, that could be because the `KeyboardInterrupt` exception is getting caught at some point. If you have a `try: ... except:` clause that doesn't name any specific exception classes, that could be responsible. – David Z Aug 09 '10 at 20:32
-
I'm having the exact same problem and Ctrl+C does not work since it hangs inside a C-call somewhere. My CPU is running at 100%. – Jonas Adler Apr 12 '19 at 08:56
-
1Does this answer your question? [Showing the stack trace from a running Python application](https://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application) – user202729 Aug 14 '21 at 11:45
14 Answers
Let's assume that you are running your program as:
python YOURSCRIPT.py
Try running your program as:
python -m trace --trace YOURSCRIPT.py
And have some patience while lots of stuff is printed on the screen. If you have an infinite loop, it will go on for-ever (halting problem). If it gets stuck somewhere, then mostly you are stuck on I/O or it is a deadlock.

- 6,061
- 6
- 34
- 39
-
23+1. You can use the --ignore-dir or --ignore-module options to reduce the amount of output, e.g. to stop it tracing through all the standard modules. You can also redirect the output to a file for later examination. – Dave Kirby Aug 09 '10 at 20:25
I wrote a module that prints out threads that hang longer that 10 seconds at one place. hanging_threads.py
Run:
python -m pip install hanging_threads
Add this to your code:
from hanging_threads import start_monitoring
start_monitoring(seconds_frozen=10, test_interval=100)
Here is an example output:
-------------------- Thread 5588 --------------------
File "C:\python33\lib\threading.py", line 844, in _exitfunc
t.join()
File "C:\python33\lib\threading.py", line 743, in join
self._block.wait()
File "C:\python33\lib\threading.py", line 184, in wait
waiter.acquire()
This occurs at the exit of the main thread when you forget to set another thread as daemon.
-
This tool is exactly what I'm looking for to find a buttleneck in my Django app under uWSGI in production. I think it is worth to be a python package. Please let us know if this happens in the future. – raacer Dec 04 '16 at 15:36
-
3@raacer, it is a package, now: https://pypi.python.org/pypi/hanging_threads – User Dec 04 '16 at 18:28
-
3this worked perfectly for me. I suggest you append to your answer the pip install command and the snippet you add at the top of your code to enable the module. – Joshua Apr 28 '20 at 15:51
-
Works like a charm! Thank you User. I have added the pip command and usage snippet to your answer. – fivef Oct 15 '20 at 10:39
-
1From Python 3.3 on there is [faulthandler](https://stackoverflow.com/a/64369870/1242521) to solve this problem. – fivef Oct 15 '20 at 10:48
Wow! 5 answers already and nobody has suggested the most obvious and simple:
- Try to find a reproducible test case that causes the hanging behavior.
- Add logging to your code. This can be as basic as
print "**010"
,print "**020"
, etc. peppered through major areas. - Run code. See where it hangs. Can't understand why? Add more logging. (I.e. if between **020 and **030, go and add **023, **025, **027, etc.)
- Goto 3.
-
3+1, this is what I often do. Of course, debuggers and IDEs are useful too, but I find that this is the quickest/easiest way to pin down the location of a bug when I already have a rough idea of which part of the source to look for it in. (Just my opinion, of course) – David Z Aug 09 '10 at 20:30
-
103For simple scripts, without complexity, this works. However, for long running, complex programs, this is useless. – xlash Oct 26 '13 at 17:01
-
17@xlash For complex programs you change "**010" to "Starting blaster module" and change "**020" to "Connecting to fusion server" etc. And change "print" to use the "logging" module. If you're not doing this, you're doing it wrong. – dkamins Oct 28 '13 at 19:18
-
11nah, the code is too big. Can't go around adding print statements everywhere. there's GOT to be a better answer – john k Sep 17 '17 at 22:31
-
Sometimes it is not hanging on your code, what if it an C++ extension hanging? What if the project depends on huge libs such as Tensorflow or Keras, this technique although interesting for some cases is not applicable for others. – Eduardo Nov 07 '18 at 23:19
-
In my experience (in Java) the use of print statements to find problems arises from one of two things: either the person doesn't understand the debugger well enough or the debugger is not good enough. My experience in python tells me one of these is true but I haven't discovered which yet. – mjaggard Oct 09 '19 at 05:48
-
@dkamins, yes, but now your code base has 100k lines. Do you suggest to add ever few lines a logging statements? Also, some statements are huge, such as `minimize`, which triggers a whole minimization behind the scenes. Other are small. I suggest to use the "Wow" in the answer, it comes off quite arrogant and doesn't add; the solution may works for small problems but it seems that you're unfamiliar with anything else than a short script. Maybe change that impression ;) – Mayou36 Feb 15 '21 at 18:23
-
1@Mayou36 I regularly work on code bases well over 100KLOC in multiple languages and have been programming for over 30 years. I suspect there's some kind of a Dunning–Kruger effect at play here. The conceptual example of 'print "**010"' was for simplicity. Logging is essential. Just do it! – dkamins Feb 16 '21 at 02:30
-
1Yes, logging is essential. But for example, you cannot change an existing library. What do you do if you debug another library? You can step inside, but you cannot cluster it with logging statements. I agree that logging is a useful technique, but it's application is limited. The way you write your answer, you seem to consider only the case where you have written the library yourself _and_ where it has been done already throughout the code. As a tip and for this case, yes. But for others, it won't work. So what do you do with a different library? – Mayou36 Feb 16 '21 at 14:36
-
1@Mayou36 You don't need 100% log coverage and can in fact start from 0%. You can add it iteratively, drilling down from a main loop toward your problem area. With good logging, if you can narrow the problem down to a static external library, then the next step would be to report a bug to the library creator including a reproducible failure case. If it's open source, you have the option of trying to fix it and submitting a patch. If none of that's possible, then you can (in preference order) a) use a better library, b) code around it, or c) write it yourself. – dkamins Feb 16 '21 at 19:55
-
I suggest also using the `icecream` library to automatically add the position of the print to the output. – Luca Di Liello Sep 17 '21 at 13:09
-
1@dkamins is correct, this feels like a Dunning–Kruger effect. I've been programming for over 22 years and I strategically add "print "**010" to code I'm debugging using a binary search tactic. I add a **010 to the start of the code, and a **999 to the end... then a **500 exactly in the middle between. Rinse and repeat by adding another debug print in the new block that failed (half the code each time). It works for me every time. – Adam Feb 12 '22 at 00:22
-
This guy probably doesn't know what he's doing either, right? ;-) https://twitter.com/vitalikbuterin/status/1532456364617834496 – dkamins Jun 03 '22 at 20:52
From Python 3.3 on there is a built in faulthandler module. To print a stack trace for all the threads when a normally fatal signal occurs:
import faulthandler
faulthandler.enable()
For a process that is hung, it is more useful to setup faulthandler to print stack traces on demand. This can be done with:
import faulthandler
import signal
faulthandler.register(signal.SIGUSR1.value)
Then once the process becomes hung you can send a signal to trigger the printing of the stack trace:
$ python myscript.py &
[1] <pid>
$ kill -s SIGUSR1 <pid>
This signal won't kill the process, and you can send multiple times to see stack traces at different points in the execution.
Note that Python 3.5 or later is required for signal.SIGUSR1. For an older version, you can just hardcode the signal number (10 for most common linux architectures).
faulthandler.dump_traceback
can be used together with threading.enumerate
to identify threads having daemon=False
to narrow down to hanging threads by their hex ID via hex(t.ident)
.
-
That's brilliant, it even has a watchdog: call `dump_traceback_later` in your main loop with a timeout that ensures it's called again before the timeout. If you're running in a docker container, the parameter `exit=True` will in effect restart your container after logging the stack. I was about to implement this myself, so i'm chuffed right now! – SvenS Nov 26 '22 at 08:43
If your program is too big and complex to be viable for single stepping with pdb or printing every line with the trace module then you could try a trick from my days of 8-bit games programming. From Python 2.5 onwards pdb has the ability to associate code with a breakpoint by using the commands
command. You can use this to print a message and continue running:
(Pdb) commands 1
(com) print "*** Breakpoint 1 ***"
(com) continue
(com) end
(Pdb)
This will print a message and carry on running when breakpoint 1 is hit. Define similar commands for a few other breakpoints.
You can use this to do a kind of binary search of your code. Attach breakpoints at key places in the code and run it until it hangs. You can tell from the last message which was the last breakpoint it hit. You can then move the other breakpoints and re-run to narrow down the place in the code where it hangs. Rinse and repeat.
Incidentally on the 8-bit micros (Commodore 64, Spectrum etc) you could poke a value into a registry location to change the colour of the border round the screen. I used to set up a few breakpoints to do this with different colours, so when the program ran it would give a psychedelic rainbow display until it hung, then the border would change to a single colour that told you what the last breakpoint was. You could also get a good feel for the relative performance of different sections of code by the amount of each colour in the rainbow. Sometimes I miss that simplicity in these new fangled "Windows" machines.

- 25,806
- 5
- 67
- 84
Multithreaded dæmon; using pyrasite to inspect a running program
I had a multithreaded dæmon that would sometimes get stuck after hours, sometimes after weeks. Running it through a debugger would be not feasible and perhaps not even helpful, as debugging multithreaded or multiprocess programs can be painful. Running it through trace might fill up gigabytes if not terabytes before it would get stuck. The second time the dæmon appeared to hang, I wanted to know right away where it was, without restarting it, adding inspection code, running it through a debugger, and waiting for hours, days, or weeks for it to hang again for circumstances yet to be investigated.
I was rescued by pyrasite, which lets the user connect to a running Python process and interactively inspect frames (example inspired by this gist):
$ pyrasite-shell 1071 # 1071 is the Process ID (PID)
Pyrasite Shell 2.0
Connected to '/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/python3.8 /opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/satpy_launcher.py -n localhost /opt/pytroll/pytroll_inst/config/trollflow2.yaml'
Python 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(DistantInteractiveConsole)
>>> import sys
>>> sys._current_frames()
{139652793759488: <frame at 0x7f034b2c9040, file '<console>', line 1, code <module>>, 139653520578368: <frame at 0x7f034b232ac0, file '/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py', line 112, code __init__>}
The first frame is not informative; that's our own pyrasite shell. The second frame, however, reveals that currently our script is stuck in the module pyresample.spherical
in line 112. We can use the traceback module to get a full traceback:
>>> import traceback
>>> traceback.print_stack(list(sys._current_frames().values())[1])
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/satpy_launcher.py", line 80, in <module>
main()
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/satpy_launcher.py", line 75, in main
run(prod_list, topics=topics, test_message=test_message,
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/launcher.py", line 152, in run
proc.start()
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
code = process_obj._bootstrap(parent_sentinel=child_r)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/launcher.py", line 268, in process
cwrk.pop('fun')(job, **cwrk)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/plugins/__init__.py", line 403, in covers
cov = get_scene_coverage(platform_name, start_time, end_time,
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/plugins/__init__.py", line 425, in get_scene_coverage
return 100 * overpass.area_coverage(area_def)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollsched/satpass.py", line 242, in area_coverage
inter = self.boundary.contour_poly.intersection(area_boundary)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 494, in intersection
return self._bool_oper(other, -1)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 475, in _bool_oper
inter, edge2 = edge1.get_next_intersection(narcs2, inter)
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 326, in get_next_intersection
return None, None
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 298, in intersection
return None
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 264, in intersections
return (SCoordinate(lon, lat),
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 62, in cross2cart
return res
File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 112, in __init__
self.cart = np.array(cart)
and we can use all the power of Pythons introspection to inspect the stack in order to help us reconstruct the circumstances where this got stuck.

- 24,025
- 17
- 97
- 170
You could also try http://code.activestate.com/recipes/576515-debugging-a-running-python-process-by-interrupting/ . It should work as long as the Python process doesn't have signals masked, which is normally the case even if Ctrl-C doesn't work.

- 2,264
- 22
- 14
Nothing like the good old pdb
import pdb
pdb.run('my_method()',globals(),locals())
Then just hit (n) to go to the next command, (s) to step into. see the docs for the full reference. Follow your program step by step, and you'll probably figure it out fast enough.

- 24,375
- 3
- 55
- 55
-
8"and you'll probably figure it out fast enough".... unless your program isn't small. – iAdjunct May 10 '16 at 18:01
If your program is a bit too complex to simply trace all the functions, you can try running it and manually attaching a tracer program like lptrace to it. It works a bit like strace
– it prints every function call your program makes. Here's how to call it:
python lptrace -p $STUCK_PROGRAM_PID
Note that lptrace requires gdb to run.

- 61
- 2
-
20I tried this lptrace. Now I have got two python processes hanging. Cute. – Gergely Bacso Aug 18 '17 at 16:03
It's easier to prevent these hang-ups than it is to debug them.
First: for
loops are very, very hard to get stuck in a situation where the loop won't terminate. Very hard.
Second: while
loops are relatively easy to get stuck in a loop.
The first pass is to check every while
loop to see if it must be a while
loop. Often you can replace while
constructs with for
, and you'll correct your problem by rethinking your loop.
If you cannot replace a while
loop with for
, then you simply have to prove that the expression in the while
statement must change every time through the loop. This isn't that hard to prove.
Look at all the condition in the loop. Call this T.
Look at all the logic branches in the body of the loop. Is there any way to get through the loop without making a change to the condition, T?
Yes? That's your bug. That logic path is wrong.
No? Excellent, that loop must terminate.

- 384,516
- 81
- 508
- 779
-
2This is only valid for single-threaded applications. I've had deadlocks, races and other issues when developing Qt applications that use multiple threads e..g for signal and slots. No, the GIL does not prevent all of that. – Stefan Jun 30 '15 at 22:20
-
1I wholeheartedly disagree. Preventing hang-ups like this is essentially impossible. These hang-ups will inevitably happen. On the other hand, debugging is almost always possible, and a fundamentally important part of software development. – danielpops Jun 29 '17 at 00:48
Haven't used it myself but I've heard that the Eric IDE is good and has a good debugger. That's also the only IDE I know of that has a debugger for Python

- 12,828
- 8
- 49
- 67
-
Wing IDE also has a good debugger - thoroughly recommended. Regards. – Alan Harris-Reid Aug 09 '10 at 20:07
If your program has more than one thread, it could be ignoring ctrl-c because the one thread is wired up to the ctrl-c handler, but the live (runaway?) thread is deaf to it. The GIL (global interpreter lock) in CPython means that normally only one thread can actually be running at any one time. I think I solved my (perhaps) similar problem using this

- 180
- 9
i = 0
for t in threading.enumerate():
if i != 0:# and t.getName() != 'Thread-1':
print t.getName()
t._Thread__stop()
i += 1
Once you know the names of the threads; start re-executing your script and filter them down, not stopping them from being aborted. i=0 conditional prevents the main thread from being aborted.
I suggest going through and naming all your threads; such as: Thread(target=self.log_sequence_status, name='log status')
This code should be placed at the end of the main program that starts up the run-away process

- 49
- 4
Wow ! Seems you added so much code in one go without testing it that you can't say what code was added just before program started to hang... (the most likely cause of problem).
Seriously, you should code by small steps and test each one individually (ideally doing TDD).
For your exact problem of spotting what python code is running and ctrl-c does not work, I will try a raw guess: did you used some except:
catching all exceptions indistinctly. If you did so in a loop (and continue loop after managing exception), it's a very likely reason why ctrl-c does not work : it's catched by this exception. Change to except Exception:
and it should not be catched any more (there is other possibilities for ctrl+c not working like thread management as another poster suggested, but I believe the above reason is more likely).
exception KeyboardInterrupt
Raised when the user hits the interrupt key (normally Control-C or Delete).
During execution, a check for interrupts is made regularly. Interrupts typed when a built-in function input() or raw_input() is waiting for input also raise this exception. The exception inherits from BaseException so as to not be accidentally caught by code that catches Exception and thus prevent the interpreter from exiting.
Changed in version 2.5: Changed to inherit from BaseException.

- 23,497
- 17
- 97
- 116
-
1I just wonder which part of this answer got this -1 two years after writing it. The last code added is the one likely to cause the new bad bahavior, you should use TDD, or if Ctrl+C doesn't answer problem is likely to be caused by catching keyboard exceptions (and probably in a loop). Looks like there is some very low quality reviewers around... – kriss Nov 30 '12 at 10:05
-
1mute people keeping to downvote. I guess it's TDD despisers or believer that large programs should be written once and only run (and debugged) when finished. – kriss Jun 20 '13 at 13:14
-
2not a downvoter, but I can see why: if you're googling for help and find this page... for the first part it's too late and not helpful, but the second part is almost certainly correct (and deserves an upvote) - be specific in your exceptions. It could be even using TDD you get into situations where the code hangs, sure it's something since the last commit... but what. (FWIW I'm having this happen, porting a code-base to python 3, it's tested - and most tests pass now - but some weird behaviour is causing "hanging" in a test.) – Andy Hayden Dec 11 '14 at 05:45
-
Sometimes problem happens after some user actions, not after commit. Your suggestion is good, but not for all cases. – raacer Dec 04 '16 at 15:29
-
@raacer: what you say is very rare if you are using code coverage tools to check that all code path are covered. When a problem happen as a consequence of a user action, I usually call that a "Security Issue". And it's my job to write security software. The user shouldn't be able to provide any input that breaks the code (but I agree it happens sometimes). Nevertheless: it doesn't have to be the last commit, even if it's an old commit it can be tracked through bissect. But only if commits are not too large... – kriss Dec 04 '16 at 20:54
-
Unfortunately such rare but painful cases can't be tracked through bissect when the problem happens on a heavily loaded production server. Full code coverage also does not guaranty we covered all possible states that may be generated by unpredictable user behaviour.. This is really rare, but really, really painful cases ) – raacer Dec 04 '16 at 21:15
-
@raacer: I agree, I already had such cases to solve. Hopefully very rare and very painful indeed. But using print for traces or pdb() as suggested by other answers aren't really helpful either if the hang can't be reproduced at will. I usually use loging facilities and stack traces for post mortem analysis for that purpose. – kriss Dec 04 '16 at 21:26
-
Sure, prints and pdb() will not be useful in such cases too. I'm going to try the module proposed by @User that dumps the stack for the stuck threads, hope this will help. It seems like it should work for any case, including the painful one. – raacer Dec 04 '16 at 22:13
-
If your code started hanging because of a network issue or some C library is misbehaving, looking at recently added code isn't going to help you. I'm not going to downvote, but this advice helps only in the most trivial cases. – Antimony Jan 02 '18 at 21:05
-
@Antimony: I totally disagree with what you say in your comment. Reading that I wonder if you are a software developper ? Maybe more of a pen tester, a cracker or such ? When you are writing code the external environment is not changing by itself, your code does. And to spot a bug git bissect is your best friend. Of course things are completely different when spotting a bug from software in the wild (on customer sites). But then you have to be really lucky to be in a position to run a debugger or similar tool. TDD and small commits are helping *much* when it is not too late to do it! – kriss Jan 04 '18 at 00:25
-
4@kriss I am a software developer, and in fact came here because of real world problems on a real world codebase. In the real world, the external environment changes by itself all the time. For example, in this case, the issue was that the datebase deadlocked, causing Django to hang on startup. There were no code changes, it was purely an issue in the db. Just because your code shouldn't randomly stop working when you haven't touched it doesn't mean it won't in practice. – Antimony Jan 04 '18 at 01:12
-
@Antimony: not saying it *never* happens. But it's rare. Last week for instance I fixed an Issue in openssl library (related to API change between 1.1 and 1.2 and an undocumented behavior change). I spotted it by git bissecting the library code... but I disagree the environment change by itself. It's production environment vs dev environment. The later is (or should be) stable, the former is not. If the lock was related here to a specific database it should have been trivial to spot that everythings was ok with test database and code hangs in production database. – kriss Jan 04 '18 at 08:48