1

There's a nice question (Find out the CPU time and memory usage of a slurm job) about how to retrieve the CPU time and memory usage of a slurm job and spinup has a nice answer (https://stackoverflow.com/a/56555505/4570472). However, if I understand correctly, seff <job id> returns Memory Efficiency which corresponds to MAXRSS over the entire life of the job.

How do I retrieve the time series of memory (and perhaps CPU) usage?

I'd like this to understand why my slurm jobs are running out of memory after 6+ hours of running fine.

Rylan Schaeffer
  • 1,945
  • 2
  • 28
  • 50
  • Whenever I've had to do this, I added a loop in my script that printed memory and cpu usage (using `psutil` for python) to a file. – jkr Aug 04 '20 at 15:55

0 Answers0