54

I'm currently training a neural network on a remote server, using jupyter notebook. I set it up with the following:

  • tmux
  • jupyter-notebook --no-browser --port=5000
  • connecting to jupyter notebook with a browser and executing the cell for the training (output was fine when I watched for the first 10 minutes)
  • detach tmux (ctrl-b, d) and closing the browser tab

Now, when I reconnect to the jupyter notebook in the browser, I don't see the current output of the training cell, only the output that I saw when I was watching the first 10 minutes of training.

I tried to find a solution for this and, I think, there are some git issues for this certain problem but they are old and I couldn't figure out if this issue was solved or not.

edit// to make my intentions more clear, since I found some threads on StackOverflow that are addressing this problem: I don't want to wait for the training to complete, as I might want to kill the training before it finishes, when it absolutely doesn't go they way I would expect it to go. So some sort of 'live' output or at least regular output would be nice.

denfromufa
  • 5,610
  • 13
  • 81
  • 138
TheDude
  • 1,205
  • 2
  • 13
  • 21
  • 4
    I have only found a workaround in this: http://deeplearning.lipingyang.org/2018/03/29/run-jupyter-notebook-from-terminal-with-tmux/ But it involves running the whole notebook from the terminal. I wish there could be a way to do what you're asking for. – cduguet Oct 24 '18 at 12:14
  • To be clear, is the output computed but not displayed or not computed at all? In the former case, can't you simply store your output to a file which you load in another cell for monitoring? – Alexis Nov 07 '18 at 15:44
  • 3
    FWIW, there are several open issues about this: https://github.com/jupyter/notebook/issues/641 , https://github.com/jupyter/notebook/issues/1150 , https://github.com/jupyterlab/jupyterlab/issues/2833 – ijoseph Aug 15 '19 at 00:57
  • Hi. I'm currently facing the same problem. Have you found any convenient way to solve this? – shaurov2253 Dec 01 '20 at 20:16
  • And now........? – jtlz2 Jan 29 '21 at 13:06
  • What about writing the results that you are looking for to somewhere? Say writing them to a text file or to a stream (API) or something similar? Would that work? It sounds like you only need a temporary solution, and writing to a file works great for that. Otherwise, I think that Armonia.py has the right idea. – Mike Williamson Feb 22 '21 at 17:52

4 Answers4

10

This is a long-running missing feature in jupyter notebooks. I use a near-identical setup: my notebook runs inside a tmux session in a remote server, and I use it locally with ssh tunneling.

Before doing any work, I run the following snippet in the first cell:

import sys
import logging

nblog = open("nb.log", "a+")
sys.stdout.echo = nblog
sys.stderr.echo = nblog

get_ipython().log.handlers[0].stream = nblog
get_ipython().log.setLevel(logging.INFO)

%autosave 5

Now let's say, I run a cell that will take a while to complete (like a training run). Something like:

import time

def train(num_epochs):
    for epoch in range(num_epochs):
        time.sleep(1)
        print(f"Completed epoch {epoch}")

train(1000)

Now while train(1000) is running, after the first 10 seconds, I want to do something else and close the browser, and also disconnect from my remote connection.

(Note the modified short autosave duration; I added that as I often forget to save the notebook before closing the browser tab.)

After 500 seconds have passed, I can reconnect to the remote server and open the notebook in my browser. My logs of this cell will have stopped printing after "Completed epoch 9", i.e. when I disconnected. However, the kernel will still actually be running train in the backend, and it will also show "busy".

We can now just simply open up the file nb.log and we'll find all the logs, including the ones after we closed the browser and connection. We can keep refreshing the nb.log file at our leisure and new logs will keep coming up, till the kernel finishes running train().

Now if we want to stop train() before it's done, we can just press the Interrupt button in jupyter. The kernel will be freed and we can run other stuff (And a Keyboard Interrupt error message will also show up in your nb.log file). All our precomputed notebook variables and imported libraries are still there, as the kernel wasn't actually disconnected.

Although this isn't a very sophisticated solution, I find it quite easy to implement

Mercury
  • 3,417
  • 1
  • 10
  • 35
  • This is good but unfortunately it doesn't print `HuggingFace` metric outputs to the log file. Any idea how to print them? – Alaa M. May 26 '22 at 22:15
  • 2
    Ok found a workaround for `HuggingFace` progress table updates. Just configure a callback `log_callback = PrinterCallback()`, `trainer.add_callback(log_callback)`, set `logging_strategy='epoch'` in the `TrainingArguments`, and copy the implementation of `PrinterCallback()` from [this example](https://huggingface.co/docs/transformers/v4.19.2/en/main_classes/callback#transformers.TrainerCallback.example). And thanks to @Mercury's solution, the output will be redirected to the `nb.log` file. – Alaa M. May 26 '22 at 22:35
  • Where is `sys.stdout.echo` documented? – HappyFace Jun 11 '23 at 09:11
  • There isn't any proper documentation for it, but you can look at the code if you want. When you run a jupyter notebook, sys.stdout is an ipykernel.iostream.OutStream instance (as mentioned [here](https://stackoverflow.com/questions/45200375/stdout-redirect-from-jupyter-notebook-is-landing-in-the-terminal)). Then going to the [code](https://github.com/ipython/ipykernel/blob/main/ipykernel/iostream.py#L331), you can see that `echo` needs to be a file-like object. In the `write()` method of `Outstream`, if echo exists, the output is copied to that file. – Mercury Jun 27 '23 at 00:19
4

This is a still OPEN issue in Jupiter Notebook Official website. See https://github.com/jupyterlab/jupyterlab/issues/2833 "Reconnect to running session: keeping output"

ligand
  • 182
  • 1
  • 5
1

And if you use a .py file instead of a .ipynb file (jupyter notebook), and inside this .py file you print the results to test the operation of your code.

To convert from .ipynb to .py file you can use this command:

'jupyter nbconvert --to script example.ipynb'

Now, you can work with a python script instead a jupyter notebook file, this will make things easier.

In your script write prints() in the stages you think necessary in order that you can see it in Tmux terminal. So you can kill your training whenever you want (ctr+c) or not, Tmux can save the session if you want, just tape 'ctr-b + d' to detach from de session

0

I'm curently facing the same problem and I found this discussion. Mentioned Papermill works quite well. Just use something like:

nohup papermill --request-save-on-cell-execute --no-progress-bar input.ipynb output.ipynb &

input.ipnb notebook with your sourcecode.

output.ipnb processed notebook where you can see the output.

--request-save-on-cell-execute prints cell output into the output.ipnb notebook after the cell is completed.

--no-progress-bar disables showing progress bar which is quite useless if you do all the work in one cell.

nohup is there so papermill keeps running after you logout from server and $ to perform it in backgroud.

All Papermill options can be found there.

Mishak
  • 21
  • 8