0

I have multiple computation trees in my python toolkit, but not all are requiered for the current analysis:

a1 = build_a1().persist()
a2 = build_a2(a1).persist()
a3 = build_a3(a2)

b1 = build_b1().persist()
b2 = build_b2(b1).persist()
b3 = build_b3(b2)

Then I am only interested in the 'a' branch so I do:

a3.compute()     # Target result, very long to compute
a[1,2].compute() # Intermediate cached result, quick in-memory access

The issue is that when I use persist(), it automatically starts an asynchronous distributed computation on every item (ie: a*, b*), even if I am finally only interested in a specific branch.

Ideally, I want persisting delayed but starting the computation later with either:

d.persist(compute=False)

or

@dask.delayed(keep_in_memory=true)
def build_a1():
    [...]
epizut
  • 3
  • 3
  • I'm not sure I fully understand your question... but AFAIK `persist` is supposed to trigger some computation, right? It's just different from `compute` in the sense that it results in a set of futures. See: [dask docs](https://docs.dask.org/en/stable/custom-collections.html?#persist) and [this SO answer about the functions](https://stackoverflow.com/a/41807160/11390523) – pavithraes Oct 14 '21 at 09:54
  • My goal is to flag the future computation result as in-memory if computed later. – epizut Oct 14 '21 at 09:58
  • I want to pin to memory a few intermediate results but not start the computation ASAP. Imagine you have a toolkit with a lot of ready to run delayed (ie: data accessor), but you want to let the final toolkit user decide which delayed he wants to run while keeping key intermediate results in-memory for a future run – epizut Oct 14 '21 at 10:04
  • What programming language does this question refer to? – Fabrício Pereira Oct 15 '21 at 13:00
  • @FabrícioPereira python 3 – epizut Oct 15 '21 at 14:18
  • Then I suggest you edit the title to specify this. – Fabrício Pereira Oct 15 '21 at 14:38

0 Answers0