41

I have converted a Jupyter/IPython notebook to HTML format and subsequently lost the original ipynb file.

Is there a simple way to generate the original notebook file from the converted HTML file?

foglerit
  • 7,792
  • 8
  • 44
  • 64
  • Is copying the code from the html file into a new notebook not an option for you? I guess this is a rather unusual problem and I doubt that there is an easy way to do that. – cel Mar 10 '15 at 20:26
  • 6
    @cel, yes, that is an option, just not terribly practical for large notebooks. But since the ipynb JSON file and the converted HTML have more or less the same info, I was wondering if there might be a converter available. – foglerit Mar 10 '15 at 20:41
  • I don't believe there's a pre-canned converter available. – Thomas K Mar 12 '15 at 19:17
  • 1
    Yes, I also want to find a tool to do the conversation from html to ipynb. But no result yet. – Zhifei Mar 14 '17 at 05:38

3 Answers3

50

I recently used BeautifulSoup and JSON to convert html notebook to ipynb. the trick is to look at the JSON schema of a notebook and emulate that. The code selects only input code cells and markdown cells

here is my code

from bs4 import BeautifulSoup
import json
import urllib.request
url = 'http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'
response = urllib.request.urlopen(url)
#  for local html file
# response = open("/Users/note/jupyter/notebook.html")
text = response.read()

soup = BeautifulSoup(text, 'lxml')
# see some of the html
print(soup.div)
dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
for d in soup.findAll("div"):
    if 'class' in d.attrs.keys():
        for clas in d.attrs["class"]:
            if clas in ["text_cell_render", "input_area"]:
                # code cell
                if clas == "input_area":
                    cell = {}
                    cell['metadata'] = {}
                    cell['outputs'] = []
                    cell['source'] = [d.get_text()]
                    cell['execution_count'] = None
                    cell['cell_type'] = 'code'
                    dictionary['cells'].append(cell)

                else:
                    cell = {}
                    cell['metadata'] = {}

                    cell['source'] = [d.decode_contents()]
                    cell['cell_type'] = 'markdown'
                    dictionary['cells'].append(cell)
open('notebook.ipynb', 'w').write(json.dumps(dictionary))

here is part of print(soup.div) output

div class="container">
<div class="navbar-header">
<button class="navbar-toggle collapsed" data-target=".navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<i class="fa fa-bars"></i>
</button>
<a class="navbar-brand" href="/">
<img src="/static/img/nav_logo.svg?v=479cefe8d932fb14a67b93911b97d70f" width="159"/>
</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a class="active" href="http://jupyter.org">JUPYTER</a>
</li>
<li>
<a href="/faq" title="FAQ">
<span>FAQ</span>

A screen shot of the resulting ipynb file, loaded on my local jupyter and after running all the cells

enter image description here

sgDysregulation
  • 4,309
  • 2
  • 23
  • 31
  • 5
    That's great. Thanks for sharing. – foglerit Nov 08 '17 at 14:04
  • 4
    Works like a charm! I just had to install `lxml` (`pip install lxml`) and ipynb created! – mdev May 29 '19 at 17:55
  • 3
    ❤️extra basic how-to steps 1. create a new file `intonotebook.py` Open it code editor (not in Word) 2. copy-paste the first block of code from this answer. 3. Change the top line 4 to your file the web. but if file's on your computer, put # in front of lines 4 and 5, and remove # before line 7. Then change line 7 to where your html file is (# means a 'comment'). make sure there are no spaces at the beginning of lines you edited. save the file. 4. open terminal, go to the folder your created the file and type `python intonotebook.py`. 5. To change name of output file, change last line – drpawelo May 26 '20 at 10:35
  • Is it possible to keep the cell's output in the converted .ipynb file? – THN Sep 26 '21 at 14:34
  • removing the line `cell['outputs'] = []` should allow for the output to be kept – sgDysregulation Sep 27 '21 at 20:16
  • Hi @drpawelo does this still work? – nvs0000 Aug 27 '22 at 03:22
  • You can run a demonstration of this code right in your browser in temporary sessions launched from [here](https://github.com/fomightez/back_to_ipynb). Go [there](https://github.com/fomightez/back_to_ipynb) and click on `launch binder` to get started. This way everything there is already set to try it all out without needing to touch your own machine or computational environments. Then you can even drag-and-drop in your own files and convert them back as well. – Wayne Oct 13 '22 at 16:21
  • The current notebook representation that get saved as HTML by nbconvert does not match what is used in the code block working with the one displayed at nbviewer, and so you need to adapt it. The demonstration that I link to in my comment above has an example in the 'local' section of converting to HTML using just nbconvert and converting back with the tags updated. – Wayne Oct 13 '22 at 19:40
  • It seems mostly the new tags are what people are getting and so I have updated [my version of the code](https://nbviewer.org/github/fomightez/back_to_ipynb/blob/master/back_to_the_ipynb_demo.ipynb) to feature the new tags and referenced code to be used if you have the old tags. – Wayne Apr 14 '23 at 17:45
2

Note the best answer may need some modification of the tags for it to work in late 2022 and forward

I'm adding this as an answer to highlight comments I made below the nice upvoted Answer.
Note that the current version of the awesome highly upvoted one won't probably work as the HTML tags signaling the various cells has changed. If you happen to have a really old version of HTML made, it may work. However, most of you will have have newer made HTML and you need the new tags to be in the code to distinguish the cells.

See my comments below that highly-voted on post (you'll need to click on 'Show more comments' option at the bottom to reveal all the comments) for a link to get a place you can run it in an active Juptyer session right in your browser, without needing to sign in, via MyBinder service with the updated version of the code with the current tags used. (See the fist code cell here for a direct source. The tags being different affects a few lines of the original code.

Wayne
  • 6,607
  • 8
  • 36
  • 93
-2

Here's a trick: Save the html file as a .txt file and then open it in your code editor. Then rename the file extension as .ipynb That should do the trick.