0

I want to convert jupyter notebooks to python files, but before doing so, I want to filter their contents, e.g. remove all markdown cells; therefore the gui export functionality or calling nbconvert from the command line doesn't exactly satisfy my needs.

What I want to do therefore is to first load the contents of the notebook into a python container, e.g. a list of dictionaries, and then do the filtering on the container, save the remaining contents as a jupyter notebook that is then exported to a python file.

Question:

  • what data structure is most appropriate for holding the contents of a jupyter notebook?
  • Are there any libraries that already enable the manipulation of jupyter notebooks with python?
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
user2961818
  • 721
  • 5
  • 11
  • Notebooks are dictionaries stored in JSON, so I think you should just be able to load the notebook into a Python object using `json.load()` – Joe Aug 30 '23 at 09:57
  • 1
    You may be able to adapt the code for Jupytext to do what you want. [Jupytext](https://jupytext.readthedocs.io/en/latest/using-cli.html) converts Jupyter `.ipynb` files to a number of text forms. One of them is `.py` scripts. Usually they need further work if the source already wasn't pure Python. .... You'll want to look into the Python module nbformat for removing markdown cells, see [here](https://stackoverflow.com/a/71244733/8508004) and [here](https://stackoverflow.com/a/59776611/8508004) for more about nbformat and examples. nbformat already has the concepts of types of cells baked in. – Wayne Aug 30 '23 at 12:44

1 Answers1

2

An IPython notebook is just JSON. You can just parse the JSON.

The description of the format is here.

Briefly:

At the highest level, a Jupyter notebook is a dictionary with a few keys:

  • metadata (dict)
  • nbformat (int)
  • nbformat_minor (int)
  • cells (list)

There are markdown cells:

{
  "cell_type" : "markdown",
  "metadata" : {},
  "source" : ["some *markdown*"],
}

And code cells:

{
  "cell_type" : "code",
  "execution_count": 1, # integer or null
  "metadata" : {
      "collapsed" : True, # whether the output of the cell is collapsed
      "autoscroll": False, # any of true, false or "auto"
  },
  "source" : ["some code"],
  "outputs": [{
      # list of output dicts (described below)
      "output_type": "stream",
      ...
  }],
}

But I'm not sure why:

jupyter nbconvert --to script mynotebook.ipynb

Doesn't work for you.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • nbconvert is fine for converting notebooks to python files, but not for filtering the notebook's contents before the conversion. – user2961818 Aug 30 '23 at 10:10