I am running a simple python script on SLURM scheduler for HPC. It reads in a data set (approximately 6GB) and plots and saves images of parts of the data. There are several of these data files so I use a loop to iterate until I finish plotting data from each file.
For some reason, however, there is a memory usage increase in each loop. I've mapped my variables using the getsizeof() but they don't seem to change over iterations. So I'm not sure where this memory "leak" could be coming from.
Here's my script:
import os, psutil
import sdf_helper as sh
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
plt.rcParams['figure.figsize'] = [6, 4]
plt.rcParams['figure.dpi'] = 120 # 200 e.g. is really fine, but slower
from sys import getsizeof
for i in range(5,372):
plt.clf()
fig, ax = plt.subplots()
#dd gets data using the epoch specific SDF file reader sh.getdata
dd = sh.getdata(i,'/dfs6/pub/user');
#extract density data as 2D array
den = dd.Derived_Number_Density_electron.data.T;
nmin = np.min(dd.Derived_Number_Density_electron.data[np.nonzero(dd.Derived_Number_Density_electron.data)])
#extract grid points as 2D array
xy = dd.Derived_Number_Density_electron.grid.data
#extract single number time
time = dd.Header.get('time')
#free up memory from dd
dd = None
#plotting
plt.pcolormesh(xy[0], xy[1],np.log10(den), vmin = 20, vmax = 30)
cbar = plt.colorbar()
cbar.set_label('Density in log10($m^{-3}$)')
plt.title("time: %1.3e s \n Min e- density: %1.2e $m^{-3}$" %(time,nmin))
ax.set_facecolor('black')
plt.savefig('D00%i.png'%i, bbox_inches='tight')
print("dd: ", getsizeof(dd))
print("den: ",getsizeof(den))
print("nmin: ",getsizeof(nmin))
print("xy: ",getsizeof(xy))
print("time: ",getsizeof(time))
print("fig: ",getsizeof(fig))
print("ax: ",getsizeof(ax))
process = psutil.Process(os.getpid())
print(process.memory_info().rss)
output
Reading file /dfs6/pub/user/0005.sdf
dd: 16
den: 112
nmin: 32
xy: 56
time: 24
fig: 48
ax: 48
8991707136
Reading file /dfs6/pub/user0006.sdf
dd: 16
den: 112
nmin: 32
xy: 56
time: 24
fig: 48
ax: 48
13814497280
Reading file /dfs6/pub/user/0007.sdf
dd: 16
den: 112
nmin: 32
xy: 56
time: 24
fig: 48
ax: 48
18648313856
SLURM Input
#!/bin/bash
#SBATCH -p free
#SBATCH --job-name=epochpyd1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=20000
#SBATCH --mail-type=begin,end
#SBATCH --mail-user=**
module purge
module load python/3.8.0
python3 -u /data/homezvol0/user/CNTDensity.py > density.out
SLURM output
/data/homezvol0/user/CNTDensity.py:21: RuntimeWarning: divide by zero encountered in log10
plt.pcolormesh(xy[0], xy[1],np.log10(den), vmin = 20, vmax = 30)
/export/spool/slurm/slurmd.spool/job1910549/slurm_script: line 16: 8004 Killed python3 -u /data/homezvol0/user/CNTDensity.py > density.out
slurmstepd: error: Detected 1 oom-kill event(s) in step 1910549.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
As far as I can tell everything seems to be working. Not sure what could be taking up more than 20GB of memory.
EDIT So I began commenting out sections of the loop from the bottom up. It's now clear that pcolormesh is the culprit.
I've added (Closing pyplot windows):
fig.clear()
plt.clf()
plt.close('all')
fig = None
ax = None
del fig
del ax
To the end but the memory keeps climbing no matter what. I'm at a total loss at what's happening.