Looking for some advice on the following problem.
I have a number of jobs running using mpi4py on a SLURM system. I have noticed that when a given job is too big (i.e. too much data to process) I get the following error:
mpirun noticed that process rank 0 with PID 62208 on node node1 exited on signal 9 (Killed).
I have tried breaking some jobs down into smaller chunks before submitting them, but I was wondering if there is a way to anticipate a Killed signal and add an except statement to break the job into chunks when the need arises.