I have multiple computation trees in my python toolkit, but not all are requiered for the current analysis:
a1 = build_a1().persist()
a2 = build_a2(a1).persist()
a3 = build_a3(a2)
b1 = build_b1().persist()
b2 = build_b2(b1).persist()
b3 = build_b3(b2)
Then I am only interested in the 'a' branch so I do:
a3.compute() # Target result, very long to compute
a[1,2].compute() # Intermediate cached result, quick in-memory access
The issue is that when I use persist(), it automatically starts an asynchronous distributed computation on every item (ie: a*, b*), even if I am finally only interested in a specific branch.
Ideally, I want persisting delayed but starting the computation later with either:
d.persist(compute=False)
or
@dask.delayed(keep_in_memory=true)
def build_a1():
[...]