0

I have a notebook with large intermediate data frames that are computationally intensive to generate. I would like to be able to cache them between sessions. However, I would like to be able to automatically recompute them if one of the steps/variables used to generate the resulting data frames has changed (e.g. new data or filtered data upstream). Is there some way to link a cached variable to any number of "state" variables such that the target variable is recomputed if a change in the "state" has been detected (and loaded from cache is no change is detected)?

Related Questions/Answers

Example of what I'm looking for...

Consider this a notebook environment, with cells separated by the # ---- # comments.

# Initial work, loading datasets and whatnot

df = load_dataset()
df = clean_dataset()

# Some intermediate variables that are used later on...
s1, s2, s3 = compute_intermediate_variables()

# ---- #

# Intensive computation cell

def compute(df, s1, s2, s3): # Defining the expensive computation
    return some_func(df, s1, s2, s3)

... # Hopefully use `compute` in the caching operation...

I am looking for a caching function or notebook magic thingy that can cache the result of compute and recompute it only if a change in df, s1, s2, or s3 has been detected. Therefore, rerunning the cell repeatedly should be a near-instant operation. Repeatedly opening, running, and then closing the notebook should also not be hindered by this "expensive computation cell" after the first time.

Joshua Shew
  • 618
  • 2
  • 19

0 Answers0