How to detect Databricks environment programmatically

Question

I'm writing a spark job that needs to be runnable locally as well as on Databricks.

The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the job is running in Databricks. The best way I have found so far was to look for a "dbfs" directory in the root dir and if it's there then assume it's running on Databricks. This doesn't feel like the right solution. Does anyone have a better idea?

Set an environment variable when running on Databricks? and not/another when running locally? similar to a dev/production seperation. — DTul, Apr 08 '19 at 13:42
Can you pass parameter such profile ? On Databricks is cluster mode ? — howie, Apr 11 '19 at 12:55
@steven35 Would you build it as jar and run it via `spark-submit` job? — Sai, Apr 13 '19 at 18:15

pathikrit · Accepted Answer · 2019-04-18T17:27:39.027

21

You can simply check for the existence of an environment variable e.g.:

def isRunningInDatabricks(): Boolean = 
  sys.env.contains("DATABRICKS_RUNTIME_VERSION")

edited Apr 18 '19 at 17:27

answered Apr 14 '19 at 16:39

pathikrit

32,469
37
142
221

9

In Python, the condition would be ```"DATABRICKS_RUNTIME_VERSION" in os.environ``` – DarkHark May 17 '21 at 18:14

simon_dmorias · Answer 2 · 2019-04-12T15:08:10.957

3

How about this:

Python:

def isLocal():
    setting = spark.conf.get("spark.master")
    return ("local" in setting)

Scala:

def isLocal(): Boolean = {
    val setting = spark.conf.get("spark.master")
    return ("local" contains setting)
}

edited Apr 12 '19 at 15:08

answered Apr 12 '19 at 14:58

simon_dmorias

2,343
3
19
33

I got 'local[*]' on azure dbs china. – hsc Sep 13 '21 at 03:21

score 1 · Answer 3 · answered Jul 13 '18 at 21:19

1

You can look for spark configuration environment variable such as "spark.home" and value as /databricks/spark

python: sc._conf.get("spark.home")

result: '/databricks/spark'

answered Jul 13 '18 at 21:19

Arun

1,777
10
11

2

That's a good approach but the reason why I need to know if it's running on Databricks in the first place is so that I can create the config and context accordingly. – steven35 Jul 16 '18 at 08:36

score 1 · Answer 4 · answered Aug 09 '21 at 23:37

1

This is simple. Databricks notebooks are not files.

# If it is a file...
try:
    __file__
    print("It is a file")
except NameError:
    print("It is a Databricks notebook")

answered Aug 09 '21 at 23:37

rjurney

4,824
5
41
62

How to detect Databricks environment programmatically

4 Answers4