dask.delayed memory management when a single task can consume a lot of memory outside of python

Question

I have some calculations calling the pardiso() solver from python. The solver allocates its own memory in a way that is opaque to python, but the pointers used to access that memory are stored in python. If I were to try and run these calculations using dask.delayed is there any way to tell dask the expected memory consumption of the calculation so that it can schedule them appropriately?

Could I ask you to provide a [mcve](https://stackoverflow.com/help/minimal-reproducible-example)? — rrpelgrim, Oct 04 '21 at 08:22

score 0 · Answer 1 · answered Oct 09 '21 at 13:36

There are at least two solutions to a situation where there is some constraint that dask should respect: resources argument and Semaphore.

For the resources the workflow is to allocate some amount of resources to each worker (either via cli when launching the workers or using resources kwarg in LocalCluster or another type of cluster). Then, the code would specify how much of this resource is used by each task at the time of .compute or .map/.submit.

The workflow with Semaphore is to specify the number of leases possible (note that unlike resources this is an integer, so in some sense less flexible) when creating the Semaphore (see docs). Then whenever the costly resource is accessed it should be wrapped in with sem context manager.

dask.delayed memory management when a single task can consume a lot of memory outside of python

1 Answers1