2

Desired behaviour

We have an existing workflow in vanilla Jupyter Notebook/Lab where we use relative paths to store outputs of some notebooks. Example:

  • /home/user/notebooks/notebook1.ipynb
  • /home/user/notebooks/notebook1_output.log
  • /home/user/notebooks/project1/project.ipynb
  • /home/user/notebooks/project1/project_output.log

In both notebooks, we produce the output by simply writing to ./output.log or so.

Problem

However, we are now trying Google Dataproc with Jupyter optional component, and the current directory is always / regardless of which notebook it's run from. This applies for both the notebook and Lab interfaces.

What I've tried

Disabling c.FileContentsManager.root_dir='/' in /etc/jupyter/jupyter_notebook_config.py causes the current directory to be set to wherever I started jupyter notebook from, but it is always that initial starting folder instead of following the .ipynb notebook files.

Any idea on how to restore the "dynamic" current directory behaviour?

Even if it's not possible, I'd like to understand how Dataproc even makes Jupyter behave differently.

Details

  • Dataproc Image 2.0-debian10
  • Notebook Server 6.2.0
  • Jupyterlab 3.0.18
Leighton Ritchie
  • 501
  • 4
  • 15
  • If my answer addressed your question, consider upvoting and accepting it. If not, let me know so that the answer can be improved. Accepting an answer will help the community members with their research as well – Sayan Bhattacharya Jul 08 '22 at 06:09

2 Answers2

1

No it is not possible to always get the current directory where your .ipynb file is. Jupyter is running from the local filesystem of the master node of your cluster. It will always take the default system path for its kernel.

In other cases(besides dataproc) also it is not possible to consistently get the path of a Jupyter notebook. You can check out this thread regarding this topic.

You have to mention the directory path for your log file to be saved in the desired path.

Note that the GCS folder in your Lab refers to the Google Cloud storage Bucket of your cluster. You can create .ipynb in GCS but when you will execute the file it will be running inside the local system.Thus you will not be able to save log files in GCS directly.


EDIT:

It's not only Dataproc who makes Jupyter behave differently.If you use Google Colab notebooks there you will also see the same behaviour.

The reason is because youre always executing code in the kernel does not matter where the file is. And in theory multiple notebooks could connect to that kernel.Thus you can't have multiple working directories for the same kernel.

As I mentioned earlier by default if you're starting a notebook, the current working directory is set to the path of the notebook.

Link to the main thread -> https://github.com/ipython/ipython/issues/10123

Sayan Bhattacharya
  • 1,365
  • 1
  • 4
  • 14
  • Yup, I get that the `GCS` folder isn't on the local filesystem. But even when using notebooks within `Local Disk`, `os.getcwd()` still always returns `/` – Leighton Ritchie Jul 06 '22 at 12:37
  • 1
    Yeah , @LeightonRitchie That's true . Its always in the root directory. You can use the `cd your_directory` command to change its directory before you create the logger.Its an additional step for your notebook – Sayan Bhattacharya Jul 06 '22 at 12:41
0

Definitely a general solution for most use-cases seems to be what is described right here in the github issue: https://github.com/ipython/ipython/issues/10123#issuecomment-354889020

Eben du Toit
  • 446
  • 3
  • 10