I've run into some memory problems while using dask's LocalCluster
. I'm working on a machine with 32 CPUs, but I have only 64GB RAM available. I'm instantiating the cluster like that:
cluster = LocalCluster(
n_workers=os.cpu_count(),
threads_per_worker=1
)
Dask, by default, assigns equal amount of memory per worker (total RAM divided by number of workers).
I'm using dask to compute research batches. Those batches differ in their need for memory. There aren't any problems when I process 32 smaller batches, as they fit into the memory. My problem comes, when I move into bigger batches, which can't fit into 2GB of assigned available RAM. Then dask is raising memory allocation errors. I've seen that I can increase worker's timeout, but it's not very elegant solution. Is there any way to tell dask to keep the scheduled task in the queue, unless resources are available? What'd be the correct way to handle these tasks in the queue while using LocalCluster
?