17

My Jupyter notebooks is getting long, which makes it difficult to navigate.

I want to save each chapter (Cel starting with Heading 1) to a different file. How can I do that? Cut and paste of multiple cells between notebooks seems not possible.

sjdh
  • 3,907
  • 8
  • 25
  • 32
  • In JupyterLab, selecting and dragging a range of cells to another notebook is possible, see 'Use JupyterLab to drag by hand a sequence of cells to a new notebook' towards the bottom of [this response to 'Jupyter notebook, move cells from one notebook into a new notebook'](https://stackoverflow.com/a/71244733/8508004). There's also [nbformat](https://nbformat.readthedocs.io/en/latest/api.html#) that allows you to parse notebooks and create new ones programmatically, if you need a level of customization beyond what `nbmanips` (see tturbo's excellent response) can easily handle. – Wayne Feb 22 '23 at 16:02

4 Answers4

14

This is the method I use - it is a little awkward, but it works:

  1. Make multiple copies of the master notebook using File->Make Copy from the menu. make one copy for each chapter you want to extract.
  2. Rename the copy for each chapter: e.g. rename "master-copy0" to "Chapter 1".
  3. Delete each cells that don't belong to Chapter 1 - for example using 'dd' in command mode.
  4. Save the abbreviated file.
  5. Repeat steps 3 and 4 for each chapter.

I believe that the developers may be working on a better solution for a future release.

David Smith
  • 956
  • 6
  • 8
  • I only answered your first question - I believe that you are only allowed one question. I suggest posing the second question, regarding a hyperlinked content, separately. – David Smith Sep 21 '14 at 17:03
  • Hi David, thank you for your answer. Your method works, but it is a lot of work if you have to do it regularly. My notebook contains 10 chapters with each about 100 blocks. Selecting several blocks is not possible. That boils down to 10 * 9 * 100 = 90000 times selecting a single block and pressing dd. Perhaps it can be automatized in some way. – sjdh Sep 22 '14 at 00:06
  • Maybe not so bad as you think. – David Smith Oct 16 '14 at 05:00
  • 1
    Maybe not as bad as you think. **First**, you made an arithmetic mistake. You only need to delete 9000 cells, not 90000. **Second**, you don't need to select each cell, after delete, the next cell will be automatically selected. **Third**, hold down the "d" key so it auto-repeats and deletes cells sequentially (very fast!). You may have to adjust the key repeat rate so as not to overrun and delete blocks you didn't intend to. I estimate you can complete the whole one-time reorganization in well under an hour. I didn't know you wanted regularly - your question said _One of my notebooks_. – David Smith Oct 16 '14 at 06:13
  • While this isn't a programmatic solution, I think that for most developers this solution is best for a _few_ notebooks. – Seabass77 Feb 18 '19 at 13:59
2

The easiest way might be to edit the .ipnb file in a text editor. Below I listed the content of a very simple notebook.

The notebook looks like

Chapter 1

In [1]: 1+1

Out[1]: 2

Chapter 2

In [2]: 2+2

Out[2]: 4

To take out chapter 1 and place it behind chapter 2, this is what you can do

  1. Search for "level": 1
  2. You find { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Chapter 1" ] }, and { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Chapter 2" ] },
  3. Move everything from the start of the first search result, just below the end of the second search result
  4. Pay attention to commas

You can manipulate multiple notebooks in a simlar fashion.

This is the .ipnb file for the example

{
 "metadata": {
  "name": "",
  "signature": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Chapter 1"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "1+1"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 1,
       "text": [
        "2"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Chapter 2"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "2+2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "4"
       ]
      }
     ],
     "prompt_number": 2
    }
   ],
   "metadata": {}
  }
 ]
}
sjdh
  • 3,907
  • 8
  • 25
  • 32
2

A notebook file is json format, so I get all data as JSON format and split it into several files automatically.

This code is what I made.

The code seems to be complex, but it is simple if you just check it for a while and this is an example of a separate file, http://www.fun-coding.org/DS&AL4-1.html which I also transformed as HTML after I split it.

import json
from pprint import pprint
import re

def notebook_spliter(FILENAME, chapter_num):

    with open(FILENAME + '.ipynb') as data_file:    
        data = json.load(data_file)

    copy_cell, chapter_in = list(), False

    regx = re.compile("## [0-9]+\. ")
    for num in range(len(data['cells'])):
        if chapter_in and data['cells'][num]['cell_type'] != 'markdown':
            copy_cell.append(data['cells'][num])
        elif data['cells'][num]['cell_type'] == 'markdown':
            regx_result = regx.match(data['cells'][num]['source'][0])

            if regx_result:
                print (regx_result.group())
                regx2 = re.compile("[0-9]+")
                regx2_result = regx2.search(regx_result.group())
                if regx2_result:
                    print (int(regx2_result.group()))
                    if chapter_in == False:
                        if chapter_num == int(regx2_result.group()):
                            chapter_in = True
                            copy_cell.append(data['cells'][num])
                    else:
                        if chapter_num != int(regx2_result.group()):
                            break
            elif chapter_in:
                copy_cell.append(data['cells'][num])

    copy_data["cells"] = copy_cell
    copy_data["metadata"] = data["metadata"]
    copy_data["nbformat"] = data["nbformat"]
    copy_data["nbformat_minor"] = data["nbformat_minor"]
    with open(FILENAME + '-' + str(chapter_num) + '.ipynb', 'w') as fd:
        json.dump(copy_data, fd, ensure_ascii=False)

This is a function to check chapter numbers in a notebook file. I added chapter number to the notebook file with '## 1. chapter name' in markdown cell, so just check ## digit. pattern with regular expression.

Then, next code is to copy data of cells into this chapter number, and save the only copied cells and others(metadata, nbformat, and nbformat_minor) to separate file.

copy_data = dict()
FILENAME = 'DS&AL1' 
CHAPTERS = list()
with open(FILENAME + '.ipynb') as data_file:    
    data = json.load(data_file)

for num in range(len(data['cells'])):
    if data['cells'][num]['cell_type'] == 'markdown':
        regx_result = regx.match(data['cells'][num]['source'][0])

        if regx_result:
            regx2 = re.compile("[0-9]+")
            regx2_result = regx2.search(regx_result.group())
            if regx2_result:
                CHAPTERS.append(int(regx2_result.group()))
print (CHAPTERS)

for chapternum in CHAPTERS:
    notebook_spliter(FILENAME, chapternum)
Levi Moreira
  • 11,917
  • 4
  • 32
  • 46
Dave Lee
  • 21
  • 1
2

2023 Update

Some years later, luckily there is a library that can do such things for you:

pip install nbmanips
nb select has_html_tag h1 | nb split -s nb.ipynb
  • The first part of the command (nb select has_html_tag h1) will tell nbmanips on which cells to perform the split.

  • The second part (nb split -s nb.ipynb) will split the notebook based on the piped selection. The -s flag tells nbmanips to use the selection instead of a cell index.

my source: https://towardsdatascience.com/split-your-jupyter-notebooks-in-2-lines-of-code-de345d647454

the library: https://pypi.org/project/nbmanips/

tturbo
  • 680
  • 5
  • 16
  • 1
    This looks to be a very useful utility. For those looking for further programmatic customization beyond perhaps what is easily achieved with `nbmanips`, I'd suggest looking into [https://nbformat.readthedocs.io/en/latest/api.html#]. `nbformat` comes as part of Jupyter, and so it runs wherever you have your notebooks running without you needing any additional package. Maybe a good place to get started is [here](https://stackoverflow.com/a/71244733/8508004), as the question is related. I've posted examples with code [here](https://discourse.jupyter.org/search?q=nbformat%20%40fomightez). – Wayne Feb 22 '23 at 16:08