24

I'm writing a spark job that needs to be runnable locally as well as on Databricks.

The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the job is running in Databricks. The best way I have found so far was to look for a "dbfs" directory in the root dir and if it's there then assume it's running on Databricks. This doesn't feel like the right solution. Does anyone have a better idea?

steven35
  • 3,747
  • 3
  • 34
  • 48
  • 2
    Set an environment variable when running on Databricks? and not/another when running locally? similar to a dev/production seperation. – DTul Apr 08 '19 at 13:42
  • Can you pass parameter such profile ? On Databricks is cluster mode ? – howie Apr 11 '19 at 12:55
  • @steven35 Would you build it as jar and run it via `spark-submit` job? – Sai Apr 13 '19 at 18:15

4 Answers4

21

You can simply check for the existence of an environment variable e.g.:

def isRunningInDatabricks(): Boolean = 
  sys.env.contains("DATABRICKS_RUNTIME_VERSION")
pathikrit
  • 32,469
  • 37
  • 142
  • 221
3

How about this:

Python:

def isLocal():
    setting = spark.conf.get("spark.master")
    return ("local" in setting)

Scala:

def isLocal(): Boolean = {
    val setting = spark.conf.get("spark.master")
    return ("local" contains setting)
}
simon_dmorias
  • 2,343
  • 3
  • 19
  • 33
1

You can look for spark configuration environment variable such as "spark.home" and value as /databricks/spark

python: sc._conf.get("spark.home")

result: '/databricks/spark'

Arun
  • 1,777
  • 10
  • 11
  • 2
    That's a good approach but the reason why I need to know if it's running on Databricks in the first place is so that I can create the config and context accordingly. – steven35 Jul 16 '18 at 08:36
1

This is simple. Databricks notebooks are not files.

# If it is a file...
try:
    __file__
    print("It is a file")
except NameError:
    print("It is a Databricks notebook")

rjurney
  • 4,824
  • 5
  • 41
  • 62