70

I use Jupyter Notebook to run a series of experiments that take some time. Certain cells take way too much time to execute so it's normal that I'd like to close the browser tab and come back later. But when I do the kernel interrupts running.

I guess there is a workaround for this but I can't find it

YaOzI
  • 16,128
  • 9
  • 76
  • 72
Flo
  • 1,367
  • 1
  • 13
  • 27

8 Answers8

73

The simplest workaround to this seems to be the built-in cell magic %%capture:

%%capture output
# Time-consuming code here

Save, close tab, come back later. The output is now stored in the output variable:

output.show()

This will show all interim print results as well as the plain or rich output cell.

Seb
  • 4,422
  • 14
  • 23
38

TL;DR:

Code doesn't stop on tab closes, but the output can no longer find the current browser session and loses data on how it's supposed to be displayed, causing it to throw out all new output received until the code finishes that was running when the tab closed.

Long Version:

Unfortunately, this isn't implemented (Nov 24th). If there's a workaround, I can't find it either. (Still looking, will update with news.) There is a workaround that saves output then reprints it, but won't work if code is still running in that notebook. An alternative would be to have a second notebook that you can get the output in.

I also need this functionality, and for the same reason. The kernel doesn't shut down or interrupt on tab closes. And the code doesn't stop running when you close a tab. The warning given is exactly correct, "The kernel is busy, outputs may be lost."

Running

import time
a = 0
while a < 100:
    a+=1
    print(a)
    time.sleep(1)

in one box, then closing the tab, opening it up again, and then running

print(a)

from another box will cause it to hang until the 100 seconds have finished and the code completes, then it will print 100.

When a tab is closed, when you return, the python process will be in the same state you left it (when the last save completed). That was their intended behavior, and what they should have been more clear about in their documentation. The output from the run code actually gets sent to the browser upon reopening it, (lost the reference that explains this,) so hacks like the one in this comment will work as it can receive those and just throw them into some cell.

Output is kind of only saved in an accessible way through the endpoint connection. They've been working on this for a while (before Jupyter), although I cannot find the current bug in the Jupyter repository (this one references it, but is not it).

The only general workaround seems to be finding a computer you can always leave on, and leaving that on the page while it runs, then remote in or rely on autosave to be able to access it elsewhere. This is a bad way to do it, but unfortunately, the way I have to for now.

Related questions:

Community
  • 1
  • 1
Poik
  • 2,022
  • 27
  • 44
  • I don't think the code is actually interrupted when the tab is closed. However, the output is, as you said. So the inelegant workaround that I'm using is to just store the result of my computations in such a way that it's easy to re-make the output. In particular I'm storing the confusion matrix and ROC curve from my classification and just re-plotting them. – ijoseph Nov 24 '16 at 03:48
  • 1
    You're right. I finally got around to retesting this. Good catch. I also have more details from Jupyter's and IPython's bug list itself. I'll hunt them down and update the post. – Poik Nov 25 '16 at 02:54
  • 1
    How about when running the entire notebook via the menu command "Run all"? Does execution stop after one cell? – IanS May 16 '18 at 16:21
  • @IanS That's a good question. I'll get back to you on that, unless someone beats me to it. – Poik May 16 '18 at 19:54
  • 1
    Thanks! I ran some experiments, and I'm pretty sure that the entire notebook executes. The output is lost, of course, but all cells run to the end. – IanS May 17 '18 at 09:16
  • I personally worked this around by storing important output in Google Spreadsheets (via pygsheets library). This way output is saved regardless of browser activity. – spacediver Jun 20 '19 at 18:35
9

First, install

runipy

pip install runipy

And now run your notebook in the background with the below command:

nohup runipy YourNotebook.ipynb OutputNotebook.ipynb >> notebook.log &

now the output file will be saved and also you can see the logs while running with:

tail -f notebook.log
Saitejareddy
  • 331
  • 3
  • 3
4

I am struggling with this issue as well for some time now.

My workaround was to write all my logs to a file, so that when my browser closes (indeed when a lot of logs come through browser it hangs up too) I can see the kernel job process by opening the log file (the log file can be open using Jupyter too).

#!/usr/bin/python
import time
import datetime
import logging

logger = logging.getLogger()

def setup_file_logger(log_file):
    hdlr = logging.FileHandler(log_file)
    formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
    hdlr.setFormatter(formatter)
    logger.addHandler(hdlr) 
    logger.setLevel(logging.INFO)

def log(message):
    #outputs to Jupyter console
    print('{} {}'.format(datetime.datetime.now(), message))
    #outputs to file
    logger.info(message)

setup_file_logger('out.log')

for i in range(10000):
    log('Doing hard work here i=' + str(i))
    log('Taking a nap now...')
    time.sleep(1000)
4

With JupyterLab:

This is not a problem if you are using JupyterLab (with current release v3.x.x).

To be more specific, not a problem means that, after we close the tab/browser, the notebook's kernel is kept running (so long as the jupyter server/your terminal is not closed). But the printing output of the cell (if there is any) is interrupted.

So, when we reopen the notebook, variables and etc. are all kept and updated, except the interrupted printing output.

If you care about the printing info in this case, you could try to logging it to a file. OR try using Jupyter's execute API (see below).


With Jupyter Notebook:

If you are still sticking with legacy (e.g. version 5.x/6.x) Jupyter Notebook, well, it is still not possible in the past (i.e prior to 2022).

BUT, with the planned new Notebook v7 release, by reusing the the JupyterLab codebase, this problem will also be solved in the new Jupyter Notebook.

So, try using JupyterLab or wait and updating to Notebook v7:

$ jupyter lab --version
$ 3.4.4
$ # OR waite and update the notebook, untill
$ # make sure the installed version of notebook is v7
$ jupyter notebook --version
$ 6.4.12

With Jupyter's execute API:

Other workaround is by using Jupyter's execute API:

$ jupyter nbconvert --to notebook --execute mynotebook.ipynb

This is like running the notebook as a .py file, i.e. from the command line, not a web browser UI mode.

After its execution, a new file named mynotebook.nbconvert.ipynb will be produced, and all printing output will be kept in it, but all variables will be lost. What we could do is pickling the variables that we care about.

And I don't think using runipy is still a good choice, since it's deprecated and unmaintained (after Jupyter's execute API).


ref:

Q: is it possible to make a jupyter notebook run even if the page is closed?


A: This is being solved in JupyterLab and will be solved in the future Notebook v7 release.

YaOzI
  • 16,128
  • 9
  • 76
  • 72
  • 1
    This does NOT work in JupyterLab. I've just tested it. The problem is the same—the notebook disconnects, and no longer receives the output. The docs referenced show only re-connecting to a JupyterLab Terminal. That may be. But the notebooks don't work. Hopefully v7 solves it for all. – Alex Nov 05 '22 at 00:14
  • @Alex If your focus is on the *printing output*, like what I have updated, it's not work with JupyterLab. In this case you could try Jupyter's execute API. BUT IT IS work if you just want to keep the notebook run (e.g. a very long and slow for-loop) after you close the tab/browser. – YaOzI Nov 08 '22 at 11:24
0

If you've set all cells to run and want to periodically check what's being printed, the following code would be a better option than %%capture. You can always open up the log file while kernel is busy.

import sys
sys.stdout = open("my_log.txt", "a")
Jing Xue
  • 351
  • 3
  • 4
0

I've constructed this awhile ago using jupyter nbconvert, essentially running a notebook in the background without any UI:

nohup jupyter nbconvert --ExecutePreprocessor.timeout=-1 --CodeFoldingPreprocessor.remove_folded_code=False --ExecutePreprocessor.allow_errors=True --ExecutePreprocessor.kernel_name=python3 --execute --to notebook --inplace ~/mynotebook.ipynb > ~/stdout.log 2> ~/stderr.log &

  • timeout=-1 no time out
  • remove_folded_code=False if you have Codefolding extension enabled
  • allow_errors=True ignore errored cells and continue running the notebook to the end
  • kernel_name if you have multiple kernels, check with jupyter kernelspec list
Sida Zhou
  • 3,529
  • 2
  • 33
  • 48
0

There are methods to convert your notebook to a python script. See here: convert-jupyter-notebook-python.

So you can simply convert to script and then run that script. No need for a browser.

Asher Stern
  • 2,476
  • 1
  • 10
  • 5