This is a generalization to this question: Way to extract pickles coming in and out of ipython / jupyter notebook
At the highest level, I'm looking for a way to automatically summarize what goes on in an ipython notebook. One way I see of simplifying the problem is treat all the data manipulations that on inside the notebook as a blackbox, and only focus on what its inputs and outputs are. So, is there a way given the filepaths to an ipython notebook how can you easily determine all the different files/websites it reads into memory and subsequently also all the files that it later writes/dumps? I'm thinking maybe there could be a function that scans the file, parses it for inputs and outputs, and saves it into a dictionary for easy access:
summary_dict = summerize_file_io(ipynb_filepath)
print summary_dict["inputs"]
> ["../Resources/Data/company_orders.csv", "http://special_company.com/company_financials.csv" ]
print summary_dict["outputs"]
> ["orders_histogram.jpg","data_consolidated.pickle"]
I'm wondering how to do this easily beyond just pickle objects to include different formats like: txt, csv, jpg, png, etc... and also which may involve reading data directly from the web into the notebook itself.