I have an mpi4py
program that hangs intermittently. How can I trace what the individual processes are doing?
I can run the program in different terminals, for example using pdb
mpiexec -n 4 xterm -e "python -m pdb my_program.py"
But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb
but I'd need to see the trace to figure out where the hang occurs.