I'm using a master-slaves structure to implement a parallel computation. A single master process (0
) loads data, and distributes relevant chunks and instructions to slave processes (1
-N
) which do the heavy lifting, using large objects... blah blah blah. The issue is memory usage, which I'm monitoring using resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
on each slave process.
The first task uses about 6GB of memory, as expected, but when the slave receives the second task, it balloons up to just over 10GB --- as if the previous memory wasn't being collected. My understanding was that as soon as a variable looses its references (in the below code, when the _gwb
variable is reset) garbage collection should clean house. Why isn't this happening?
Would throwing in a del _gwb
at the end of each loop help?
What about a manual call to gc.collect()
?
Or do I need to spawn subprocess
es as described in this answer?
I'm using mpi4py
on a SLURM managed cluster.
The master process looks something like:
for jj, tt in enumerate(times):
for ii, sim in enumerate(sims):
search = True
# Find a slave to give this task to
while search:
# Repackage HDF5 data into dictionary to work with MPI
sim_dat = ... # load some data
# Look for available slave process
data = comm.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG)
src = stat.Get_source()
# Store Results
if tag == TAGS.DONE:
_store_slave_results(data, ...)
num_done += 1
elif tag == TAGS.READY:
# Distribute tasks
comm.send(sim_data, dest=src, tag=TAGS.START)
# Stop searching, move to next task
search = False
cycles += 1
And the slaves:
while True:
# Tell Master this process is ready
comm.send(None, dest=0, tag=TAGS.READY)
# Receive ``task`` ([number, gravPot, ndensStars])
task = comm.recv(source=0, tag=MPI.ANY_TAG, status=stat)
tag = stat.Get_tag()
if tag == TAGS.START:
_gwb = Large_Data_Structure(task)
data = _gwb.do_heavy_lifting(task)
comm.send(data, dest=0, tag=TAGS.DONE)
elif tag == TAGS.EXIT:
break
cycles += 1
Edit: Some other strange subtleties (in case they might be relevant):
1) only some processes show the memory growing, other stay roughly the same;
2) The specific amount of memory active is different on the different slave processes (differing by 100s of MB
... even though they should necessarily be running the same code!