I have a notebook with large intermediate data frames that are computationally intensive to generate. I would like to be able to cache them between sessions. However, I would like to be able to automatically recompute them if one of the steps/variables used to generate the resulting data frames has changed (e.g. new data or filtered data upstream). Is there some way to link a cached variable to any number of "state" variables such that the target variable is recomputed if a change in the "state" has been detected (and loaded from cache is no change is detected)?
Related Questions/Answers
How to cache in IPython Notebook?, ipython notebook save variables after closing
Answer: Use
%store
The cache has to be manually recomputed.
How to pickle or store Jupyter (IPython) notebook session for later
Answer: Use
dill
to dump and load a sessionI do not want to save the entire session because that would be too large. I want to save intermediate values that are difficult to recompute, and recompute the cells that are not so computationally expensive
Example of what I'm looking for...
Consider this a notebook environment, with cells separated by the # ---- #
comments.
# Initial work, loading datasets and whatnot
df = load_dataset()
df = clean_dataset()
# Some intermediate variables that are used later on...
s1, s2, s3 = compute_intermediate_variables()
# ---- #
# Intensive computation cell
def compute(df, s1, s2, s3): # Defining the expensive computation
return some_func(df, s1, s2, s3)
... # Hopefully use `compute` in the caching operation...
I am looking for a caching function or notebook magic thingy that can cache the result of compute
and recompute it only if a change in df
, s1
, s2
, or s3
has been detected. Therefore, rerunning the cell repeatedly should be a near-instant operation. Repeatedly opening, running, and then closing the notebook should also not be hindered by this "expensive computation cell" after the first time.