0

I have several Jupyter Notebooks residing in multiple folders and nested folders. I would like to run nbstripout to delete all memory in these files using nbstripout by calling it from another Jupyter notebook. It looks like I can run the nbstripout only from command line stripping one file at a time.

Can someone give me a code example to strip multiple notebooks from another jupyter notebook?

Kay
  • 567
  • 3
  • 6
  • 15

1 Answers1

0

You can run commands inside Jupyter notebooks and combine it with Python as well to iterate on recursively traversing a file hierarchy and processing with nbstripout any file that matches .ipynb at the end. You just want to start out running the notebook in the directory where the .ipynb files are or the directories containing the .ipynb files. The easiest way is to copy the notebook to there and kick off running the code below that you'll put in it:

first code in your notebook to install, if not already handled:

Put this as markdown above that code cell:

Install with Anaconda/conda if that is your package manager, based on [here](https://anaconda.org/conda-forge/nbstripout).  
Otherwise, if you aren't using Anaconda/conda, fall back to using `pip`. To do that comment out the first line and uncomment the second line.

The first code cell:

%conda install -c conda-forge nbstripout 
#%pip install nbstripout

Second code cell does the actual work after install:

import os
import fnmatch
pattern_to_match = "*.ipynb"
# `os.walk()` starting in the current working directory as 'root' based on 
# https://stackoverflow.com/a/54673093/8508004
for root, dirs, files in os.walk('.'):
    for filename in files:
        if fnmatch.fnmatch(filename, pattern_to_match):
            !nbstripout "{os.path.join(root, filename)}"

The command pre-appended by an exclamation point allows Jupyter/IPython to run the command in a temporary shell. The curly brackets can be used to pass in a Python object to the command prefaced by the exclamation point.

os.walk is started in the directory specified by '.' as that is a shell shortcut for the current working directory.

The above code all worked in launches from here. To demonstrate, after pressing 'launch binder', wait until the session comes up and then make a new directory named testf in the file browser pane. Right-click on index.ipynb and use 'Duplicate' to make a copy. Select and drag the copy into the directory testf. Now may a new notebook in the main directory where index.ipynb is and make the notebook as instructed above. Run each of the two code cells in turn. You'll see the index.ipynb and nested copy get fixed.
Note that because {os.path.join(root, filename)} is bracketed by quotes, it will also work if you just make a new directory and leave it the default 'Untitled Folder' because with those quotes in the shell command part, it handles spaces in the directory and file names fine.


Note though that if you were talking about a lot of files to process you'd probably want to make the code pure Python and use subprocess or os.system() to call each nbstripout step. Each call to the shell with !nbstripout ... from inside a Jupyter cell involves making a new temporary shell and running the task and cleaning up and so it starts to add up to significant computational time very quickly. You don't have this overhead with the pure Python option.

Wayne
  • 6,607
  • 8
  • 36
  • 93