Persisting in memory dask delayed without starting the computation yet

Asked Oct 14 '21 at 09:19

Active Oct 15 '21 at 14:17

Viewed 143 times

I have multiple computation trees in my python toolkit, but not all are requiered for the current analysis:

a1 = build_a1().persist()
a2 = build_a2(a1).persist()
a3 = build_a3(a2)

b1 = build_b1().persist()
b2 = build_b2(b1).persist()
b3 = build_b3(b2)

Then I am only interested in the 'a' branch so I do:

a3.compute()     # Target result, very long to compute
a[1,2].compute() # Intermediate cached result, quick in-memory access

The issue is that when I use persist(), it automatically starts an asynchronous distributed computation on every item (ie: a*, b*), even if I am finally only interested in a specific branch.

Ideally, I want persisting delayed but starting the computation later with either:

d.persist(compute=False)

@dask.delayed(keep_in_memory=true)
def build_a1():
    [...]

edited Oct 15 '21 at 14:17

asked Oct 14 '21 at 09:19

epizut

I'm not sure I fully understand your question... but AFAIK `persist` is supposed to trigger some computation, right? It's just different from `compute` in the sense that it results in a set of futures. See: [dask docs](https://docs.dask.org/en/stable/custom-collections.html?#persist) and [this SO answer about the functions](https://stackoverflow.com/a/41807160/11390523) – pavithraes Oct 14 '21 at 09:54
My goal is to flag the future computation result as in-memory if computed later. – epizut Oct 14 '21 at 09:58
I want to pin to memory a few intermediate results but not start the computation ASAP. Imagine you have a toolkit with a lot of ready to run delayed (ie: data accessor), but you want to let the final toolkit user decide which delayed he wants to run while keeping key intermediate results in-memory for a future run – epizut Oct 14 '21 at 10:04
What programming language does this question refer to? – Fabrício Pereira Oct 15 '21 at 13:00
@FabrícioPereira python 3 – epizut Oct 15 '21 at 14:18
Then I suggest you edit the title to specify this. – Fabrício Pereira Oct 15 '21 at 14:38

Persisting in memory dask delayed without starting the computation yet

0 Answers0