22

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like,

#mount azure blob to dbfs location
dbutils.fs.mount (source="...",mount_point="/mnt/...",extra_configs="{key:value}")

I am also trying to unmount once the files has been written to the mount directory. But, when i am using dbutils directly in the pyspark job it is failing with

NameError: name 'dbutils' is not defined

Should i import any of the package to use dbutils in pyspark code ? Thanks in advance.

Krishna Reddy
  • 1,069
  • 5
  • 12
  • 18

3 Answers3

25

Try to use this:

def get_dbutils(spark):
        try:
            from pyspark.dbutils import DBUtils
            dbutils = DBUtils(spark)
        except ImportError:
            import IPython
            dbutils = IPython.get_ipython().user_ns["dbutils"]
        return dbutils

dbutils = get_dbutils(spark)
Elisabetta
  • 328
  • 3
  • 9
-5

To access the DBUtils module in a way that works both locally and in Azure Databricks clusters, on Python, use the following get_dbutils():

def get_dbutils(spark):
  try:
    from pyspark.dbutils import DBUtils
    dbutils = DBUtils(spark)
  except ImportError:
    import IPython
    dbutils = IPython.get_ipython().user_ns["dbutils"]
  return dbutils

See: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect

Aman Srivastava
  • 1,007
  • 1
  • 13
  • 25
  • While I appreciate the additional explanation and reference, it’s worth acknowledging that the code suggestion is an _exact_ duplicate of @Elisabetta‘s from a several months ago. – Jeremy Caney Jun 03 '20 at 05:15
-11

yes! You could use this:

pip install DBUtils
import DBUtils