129

I have a problem when the output from a notebook is really long and it's saved into the notebook, any time I want to open this particular notebook again the browser crashes and can't display correctly.

To fix this I have to open it with a text editor and delete all output from that cell causing the problem.

I wonder if there is a way to clean all output from the notebook so one can open it again without problem. I want to delete all output since deleting a specific one seems more troublesome.

mirekphd
  • 4,799
  • 3
  • 38
  • 59
Diego Rueda
  • 2,226
  • 4
  • 21
  • 41

10 Answers10

197

nbconvert 6.0 should fix --clear-output

The option had been broken for a long time previously, bug report with merged patch: https://github.com/jupyter/nbconvert/issues/822

Usage should be for in-place operation:

jupyter nbconvert --clear-output --inplace my_notebook.ipynb

Or to save to another file called my_notebook_no_out.ipynb:

jupyter nbconvert --clear-output \
  --to notebook --output=my_notebook_no_out my_notebook.ipynb

This was brought to my attention by Harold in the comments.

Before nbconvert 6.0: --ClearOutputPreprocessor.enabled=True

Same usage as --clear-output:

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace my_notebook.ipynb
jupyter nbconvert --ClearOutputPreprocessor.enabled=True \
  --to notebook --output=my_notebook_no_out my_notebook.ipynb

Tested in Jupyter 4.4.0, notebook==5.7.6.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • This will convert the notebook to html, which does not seem to be what the op wants.. – Jacquot Apr 23 '18 at 14:52
  • @Jacquot What version of Jupyter are you in? I have just re-tested and it modifies the `.ipynb` inplace without creating HTML. – Ciro Santilli OurBigBook.com Apr 23 '18 at 15:15
  • 2
    I read too quickly your comment and didn't know the `--inplace` option ; I learned something. But it appears for my version 5.3.1, the option `--clear-output` is available, that summarizes `--ClearOutputPreprocessor.enabled=True --inplace` – Jacquot Apr 23 '18 at 16:29
  • I had to add a `--to notebook` to make the second version (non-inplace) work – olejorgenb Mar 12 '19 at 16:58
  • @olejorgenb thanks for the report. Can you also give your ipython version and explain what happens if you don't give `--to notebook`? I will test this out later. – Ciro Santilli OurBigBook.com Mar 12 '19 at 17:00
  • @olejorgenb OK, it was saving as HTML, updated answer. – Ciro Santilli OurBigBook.com Mar 12 '19 at 17:22
  • 2
    The option `--clear-output` was broken, see issue [#822](https://github.com/jupyter/nbconvert/issues/822). This has been fixed last month (July 2020) so it should work again in the next release. – Harold Aug 08 '20 at 09:35
  • with `jupyter core: 4.7.1, jupyter-notebook : 6.2.0, qtconsole: 5.0.2, ipython : 7.20.0, ipykernel: 5.4.3, jupyter client : 6.1.11, jupyter lab : not installed, nbconvert : 5.6.1, ipywidgets: 6.0.0, nbformat: 5.1.2, traitlets: 5.0.5` only `jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace my_notebook.ipynb` is working for me – ucsky Oct 22 '21 at 12:00
  • If the notebook was not saved before running this, will it overwrite any unsaved changes? – Austin Wolff Nov 22 '22 at 01:36
  • @AustinWolff if it was not saved then presumably it will remove whatever output is on disk which is all it can see? And then when you save if it has output it will overwrite existing file and contain output? Let me know if you test otherwise. – Ciro Santilli OurBigBook.com Nov 22 '22 at 07:07
56

If you create a .gitattributes file, you can run a filter over certain files before they are added to git. This will leave the original file on disk as-is, but commit the "cleaned" version.

For this to work, add this to your local .git/config or global ~/.gitconfig:

[filter "strip-notebook-output"]
    clean = "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR"

Then create a .gitattributes file in your directory with notebooks, with this content:

*.ipynb filter=strip-notebook-output

How this works:

  • The attribute tells git to run the filter's clean action on each notebook file before adding it to the index (staging).
  • The filter is our friend nbconvert, set up to read from stdin, write to stdout, strip the output, and only speak when it has something important to say.
  • When a file is extracted from the index, the filter's smudge action is run, but this is a no-op as we did not specify it. You could run your notebook here to re-create the output (nbconvert --execute).
  • Note that if the filter somehow fails, the file will be staged unconverted.

My only minor gripe with this process is that I can commit .gitattributes but I have to tell my co-workers to update their .git/config.

If you want a hackier but much faster version, try JQ:

  clean = "jq '.cells[].outputs = [] | .cells[].execution_count = null | .'"
dirkjot
  • 3,467
  • 1
  • 23
  • 17
  • 1
    this is the best of both worlds. Thanks for sharing this – sousben Apr 20 '20 at 06:56
  • 1
    Didn’t know about this. This is super-useful. – Roly May 08 '20 at 10:45
  • 1
    A slightly improved alternative is as follows. It cleans the metadata, and doesn't add outputs and execution_count to non code cells like the proposed JQ solution (which results in a warning): `clean = "jq '.cells |= map(if .\"cell_type\" == \"code\" then .outputs = [] | .execution_count = null else . end | .metadata = {}) | .metadata = {}'"` – sousben Aug 12 '20 at 13:35
11

Use --ClearOutputPreprocessor.enabled=True and --clear-output

Following this command:

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --clear-output *.ipynb

8

nbstripout worked well for me.

Open the Jupyter terminal, navigate to the folder containing your notebook, and then run the following line:

nbstripout my_notebook.ipynb

Kenneth Leung
  • 300
  • 3
  • 8
6

To extend the answer from @dirkjot to resolve issue regarding sharing configuration:

Create a local .gitconfig file, rather than modifying .git/config. This makes the command that needs to be run on other machines slightly simpler. You can also create a script to run the git config command:

git config --local include.path ../.gitconfig

Note I have also changed the log level to INFO because I did want to see confirmation that the clean was running.

repo/.gitconfig

[filter "strip-notebook-output"]
    clean = "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=INFO"

repo/.gitattributes

*.ipynb filter=strip-notebook-output

repo/git_configure.sh

git config --local include.path ../.gitconfig

Users then just need to run:

$ chmod u+x git_configure.sh
$ ./git_configure.sh
gbro3n
  • 6,729
  • 9
  • 59
  • 100
4

Use clean_ipynb, which not only clears notebook output but can also clean the code.

Install by pip install clean_ipynb

Run by clean_ipynb hello.ipynb

  • 1
    [nbclean](https://github.com/choldgraf/nbclean) is a tool that can do that with some handy additional features, such as only removing only certain blocks of code/text, that make it handy for use for teaching. – Wayne Jan 16 '20 at 19:46
4

I must say I find jupyer nbconvert painfully slow for the simple job of clearing some sub-arrays and resetting some execution numbers. It’s a superior solution in maintainability because that tool is expected to be updated if there is a change in the notebook source code format. However, the alternate solution below is faster and may also be useful if you don’t have nbconvert 6.0 (I have an environment running 5.6.1 at the moment…)

A very simple jq (a sort of sed for json) script does the trick very fast:

jq 'reduce path(.cells[]|select(.cell_type == "code")) as $cell (.; setpath($cell + ["outputs"]; []) | setpath($cell + ["execution_count"]; null))' notebook.ipynb > out-notebook.ipynb

Very simply, it identifies code cells, and replaces their outputs and execution_count attributes with [] and null respectively.


Or if you only want to remove the outputs and keep execution numbers, you can do even simpler:

jq 'del(.cells[]|select(.cell_type == "code").outputs[])' notebook.ipynb > out-notebook.ipynb
Cimbali
  • 11,012
  • 1
  • 39
  • 68
1

As mentioned in one of the previous answers you can use the command-line json processor jq to perform this task notably quicker than with nbconvert. A complete command for getting rid of metadata, outputs and execution counts can be found in this blog post:

jq --indent 1 \
    '
    (.cells[] | select(has("outputs")) | .outputs) = []
    | (.cells[] | select(has("execution_count")) | .execution_count) = null
    | .metadata = {"language_info": {"name":"python", "pygments_lexer": "ipython3"}}
    | .cells[].metadata = {}
    ' 01-parsing.ipynb

If desired, you could modify to just clean a specific part of the output, such as execution counts (recursively wherever they occur in the json), and then add this as a git filter:

[filter "nbstrip"]
    clean = jq --indent 1 '(.. |."execution_count"? | select(. != null)) = null'
    smudge = cat

And add the following to ~/.config/git/attributes to have the filter applied globally to all your local repos:

*.ipynb filter=nbstripout

There is also nbstripout which is made for this purpose, but it's a bit slower.

joelostblom
  • 43,590
  • 17
  • 150
  • 159
0

I suggest using pre-commit approach, using something like:

  - repo: local
    hooks:
      - id: jupyter-nb-clear-output
        name: jupyter-nb-clear-output
        files: \.ipynb$
        stages: [commit]
        language: python
        entry: jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace
        additional_dependencies: ['jupyterlab']

also explained more in this blog.

Amir
  • 335
  • 2
  • 11
  • This update reminds me that there's also a GitHub action for cleaning notebooks, too. See [here](https://discourse.jupyter.org/t/notebook-to-github/7657/3?u=fomightez). – Wayne Mar 30 '23 at 16:47
  • when I do it this way on Github Desktop, I get the following error: "jupyter-nb-clear-output..................................................Failed - hook id: jupyter-nb-clear-output - exit code: 1 Executable `jupyter` not found" – Laurynas G Jul 03 '23 at 12:45
  • does [this thread](https://stackoverflow.com/questions/35313876/after-installing-with-pip-jupyter-command-not-found) help? – Amir Jul 03 '23 at 20:33
0

Parse the json:

#LARGE Notebook Clean Make a Copy FIRST and Run this only on the COPY!!!!

import json 
filename = 'COPY_of_Huge_Notebook.ipynb' 
f = open(filename) 
large_ntbk = json.load(f) 
f.close() 
outputs = large_ntbk['cells'] 
for o in outputs:
    if 'outputs' in o:
        outputs['outputs'] = []

small = open('small.ipynb', 'w') 
json.dump(large_ntbk, small, indent = 2) 
small.close()
Mihai.Mehe
  • 448
  • 8
  • 13