I have a jobqueue configuration for Slurm which looks something like:
cluster = SLURMCluster(cores=20,
processes=2,
memory='62GB',
walltime='12:00:00',
interface='ipogif0',
log_directory='logs',
python='srun -n 1 -c 20 python',
)
When increasing the number of processes, each worker gets a smaller allocation of memory. At the start of my flow for work, the tasks are highly parallelised and light on memory use. However, the end of the flow is currently in serial and requires more memory. Unless I set processes to be small (i.e. 2 or 3) the worker will 'run out' of memory and dask will restart it (which starts an infinite loop). There's more than enough memory to run the job on a single node, and I'd like to make efficient use of each node (minimising the total requested).
Is it possible to reconfigure the cluster
such that the memory available to workers is larger later on in the workflow?