3

I am using databricks-connect and VS Code to develop some python code for Databricks.

I would like to code and run/test everything directly from VS Code using databricks-connect to avoid dealing with Databricks web IDE. For basic notebooks, it works just fine but I would like to do the same with multiple notebooks and use imports (e.g. use import config-notebook in another notebook).

However, in VS Code import another-notebook works fine but it does not work in Databricks. From what I could find, the alternative in Databricks is %run "another-notebook" but it does not work if I want to run that from VS Code (databricks-connect does not include notebook workflow).

Is there any way to make notebook imports that works both in Databricks and is supported by databricks-connect ?

Thanks a lot for your answers !

Maxime
  • 61
  • 1
  • 5
  • Could you add some concrete example with a code sample? Are you trying to run another notebook, call a function in a lib, ... – Kashyap Oct 19 '21 at 17:29
  • @Kashyap For example, let's say I have a custom `config` notebook that has some definitions in it like `CONSTANT = "banana"`. I want to import `config` in another notebook to reuse the constants defined in it in a way that both works on Databricks and with databricks-connect – Maxime Oct 20 '21 at 10:36

3 Answers3

3

I found a solution that completes the part mentioned by @Kashyap with try ... except.

The python file of a notebook that contains a %run command should look like this :

# Databricks notebook source
# MAGIC %run "another-notebook"

# COMMAND ----------

try:
    import another-notebook
except ModuleNotFoundError:
    print("running on Databricks")

import standard-python-lib

# Some very interesting code

The # MAGIC %run avoids having SyntaxError while executing it in Python and tells Databricks it is a Magic command in a Python notebook. That way, whether the script is executed in Python via databricks-connect or in Databricks, it will work.

Maxime
  • 61
  • 1
  • 5
0

As described in How to import one databricks notebook into another?

The only way to import notebooks is by using the run command:

run /Shared/MyNotebook

or relative path:

%run ./MyNotebook

More details: https://docs.azuredatabricks.net/user-guide/notebooks/notebook-workflows.html

only way I can think of is to write conditional code that either uses import or run depending on where it's running.

Something like:

try:
    import another-notebook
    print("running in VS Code")
except ImportError:
    code = """
%run "another-notebook"
print("running in Databricks")
"""
    exec(code)


If you want to be more certain of the environment, perhaps you can use some info from the context. E.g. following code

for a in spark.sparkContext.__dict__:
  print(a, getattr(spark.sparkContext, a))

run in my cluster prints:

_accumulatorServer <pyspark.accumulators.AccumulatorServer object at 0x7f678d944cd0>
_batchSize 0
_callsite CallSite(function='__init__', file='/databricks/python_shell/scripts/PythonShellImpl.py', linenum=1569)
_conf <pyspark.conf.SparkConf object at 0x7f678d944c40>
_encryption_enabled False
_javaAccumulator PythonAccumulatorV2(id: 0, name: None, value: [])
_jsc org.apache.spark.api.java.JavaSparkContext@838f1fd
_pickled_broadcast_vars <pyspark.broadcast.BroadcastPickleRegistry object at 0x7f678e699c40>
_python_includes []
_repr_html_ <function apply_spark_ui_patch.<locals>.get_patched_repr_html_.<locals>.patched_repr_html_ at 0x7f678e6a54c0>
_temp_dir /local_disk0/spark-fd8657a8-79a1-4fb0-b6fc-c68763f0fcd5/pyspark-3718c30e-c265-4e68-9a23-b003f4532576
_unbatched_serializer PickleSerializer()
appName Databricks Shell
environment {'PYTHONHASHSEED': '0'}
master spark://10.2.2.8:7077
profiler_collector None
pythonExec /databricks/python/bin/python
pythonVer 3.8
serializer AutoBatchedSerializer(PickleSerializer())
sparkHome /databricks/spark

So e.g. your condition could be:

if spark.sparkContext.appName.contains("Databricks"):
    code = """
%run "another-notebook"
print("running in Databricks")
"""
    exec(code)
else:
    import another-notebook
    print("running in VS Code")

Kashyap
  • 15,354
  • 13
  • 64
  • 103
  • 1
    There is a problem though : the `%run` is causing a SyntaxError – Maxime Oct 20 '21 at 10:38
  • @Maxime, modified the code to work around compile time syntax error. Not tested. If it doesn't work (e.g. may be the notebook code is not interpreted directly by python, but is first post processed by Databricks engine) then you'll have to look for the equivalent API from Databricks to call instead of `%run`. And replace the Databricks specific syntax with equivalent Python'ish syntax e.g. `DatabricksNotebookRunner.run('another-notebook')` – Kashyap Oct 25 '21 at 17:14
0

Well, you can create packages .whl(wheel) install in the cluster and call via import in any notebook is a breeze