25

Databricks is smart and all, but how do you identify the path of your current notebook? The guide on the website does not help.

It suggests:

%scala
dbutils.notebook.getContext.notebookPath
res1: Option[String] = Some(/Users/user@org.dk/my_test_notebook)

This does not give me the complete path, but rather the path to some folder structure that is not accessible from the notebook. I need the path, such that I can make system calls in the same folder as the .ipynb file.

Any suggestions?

Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56

5 Answers5

30

You can retrieve the information by using dbutils command:

dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
12

For Scala it's:

dbutils.notebook().getContext().notebookPath.get

For Python:

dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
Sergei Tishkov
  • 551
  • 5
  • 6
  • 1
    In Python it gives me an error `AttributeError: module 'dbutils' has no attribute 'notebook'` – Vladimir S. Mar 11 '22 at 02:33
  • Sorry Vladimir, cannot help you with that. The code works for me. – Sergei Tishkov Mar 15 '22 at 11:22
  • 1
    This python simply gives the base notebeook path alone , Lets say there notebook_a and it calls the notebook_b, if you run notebook_a and try to print the dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get() inside notebook_b , then it shows the notebook_a . – Surender Raja Jun 16 '22 at 06:10
5

Notebook doesn't stay on the driver. Whenever you run a cell that cell will be sent for execution in the current spark session.

Try this to check.

%sh
pwd
ls

If you want to access some files or code you can upload it DBFS and access it. If it is code, you can compile it to a .jar(java, Scala) or an .egg(python) and attach the library to the cluster on which you are running the notebook.

anil
  • 168
  • 7
  • The issue is that Databricks does not have integration with VSTS. A workaround is to download the notebook locally using the CLI and then use git locally. I would, however, prefer to keep everything in Databricks. If I can download the .ipynb to the dbfs, then I can use a system call to push the notebooks to VSTS using git. – Esben Eickhardt Dec 04 '18 at 07:55
  • 1
    @EsbenEickhardt: As of January 2019, Databricks now has VSTS (now called Azure DevOps) integration. – Thomas Jan 24 '19 at 15:41
3

Access file using Databricks API

I ended up somewhat resolving the problem using the Databricks API to download and upload notebooks and other files to/from Databricks.

1. Read documentation for Databricks Workspace API

Databricks API Documentation

2. Generate API token and Get Notebook path

In the user interface do the following to generate an API Token and copy notebook path:

  1. Choose 'User Settings'
  2. Choose 'Generate New Token'
  3. In Databrick file explorer, "right click" and choose "Copy File Path"

3. Download a Notebook from Databricks

If you want to access a notebook file, you can download it using a curl-call. If you are located inside a Databricks notebook, you can simply make this call either using cell magic, %sh, or using a system call, os.system('insert command').

curl --header "Content-Type: application/json" --request GET --data '{"path":"{/Users/myuser@myorg.com/notebook_to_download}","format":"JUPYTER"}' https://{replace_with_your_databaricks}/api/2.0/workspace/export -H "Authorization: Bearer {my_token}" | jq -r .content | base64 --decode > my_downloaded_notebook.ipynb

4. Uploading a Notebook to Databricks

You can similarly upload a notebook from a machine using the following curl call:

curl -n -F format=JUPYTER -F path="{/Users/myuser@myorg.com/uploaded_notebook}" -F language=PYTHON -F content=@{/my/local/notebook.ipynb} https://{replace_with_your_databaricks}/api/2.0/workspace/import -H "Authorization: Bearer {my_token}"
Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56
  • Hi Esben, I am trying to execute the import curl inside Jenkins pipeline which is getting executed in a Linux machine. Could you please let me know the path sh """ curl -n -F format=SOURCE -F path="/Users/anupam@domain.com/notebook_file.py" -F language=PYTHON -F content=@notebook_file.py https://test-dev.cloud.databricks.com/api/2.0/workspace/import -H "Authorization: Bearer token" """ – Anupam Jan 31 '23 at 19:53
  • It sounds like you are doing a nested call. Usually it is something like: sh -c 'write your command in here'. Example NOT working: sh """curl""", Example working: sh -c """curl""" – Esben Eickhardt Mar 21 '23 at 04:17
0

You can get the path of the notebook through this steps and the answers is in the suggestions of your question also . (Assuming that the notebook you are working on is yours)

  1. Go to the workspace
  2. If the notebook is in particular user folder . Click on Users
  3. Click on particular user@org.dk
  4. Then on the notebook name /my_test_notebook

so your final path becomes /Users/user@org.dk/my_test_notebook

Shikha
  • 229
  • 3
  • 5
  • 2
    Author is asking about another thing - he was able to get path via `dbutils.notebook.getContext.notebookPath` call... – Alex Ott Feb 14 '21 at 09:26