NameError: name 'dbutils' is not defined in pyspark

Question

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like,

#mount azure blob to dbfs location
dbutils.fs.mount (source="...",mount_point="/mnt/...",extra_configs="{key:value}")

I am also trying to unmount once the files has been written to the mount directory. But, when i am using dbutils directly in the pyspark job it is failing with

NameError: name 'dbutils' is not defined

Should i import any of the package to use dbutils in pyspark code ? Thanks in advance.

Is your code in Python notebook or are you submitting a plain python file? — Martin, Jun 15 '18 at 13:12

score 25 · Answer 1 · answered Feb 03 '20 at 10:11

25

Try to use this:

def get_dbutils(spark):
        try:
            from pyspark.dbutils import DBUtils
            dbutils = DBUtils(spark)
        except ImportError:
            import IPython
            dbutils = IPython.get_ipython().user_ns["dbutils"]
        return dbutils

dbutils = get_dbutils(spark)

answered Feb 03 '20 at 10:11

Elisabetta

328
3
9

1

I tried this, and got this error: ModuleNotFoundError: No module named 'pyspark.dbutils', and this from the except clause: KeyError: 'dbutils' – Jeanne Lane May 05 '20 at 21:19
4

are you running on databricks? dbutils is not supported outside – Elisabetta May 28 '20 at 08:10
Thank you! It was in an Azure Machine Learning VM. – Jeanne Lane May 28 '20 at 18:30
Hi Elisabetta, thank you for this answer. It also helped me out a lot. – Patterson Dec 16 '21 at 10:53
2

What is the parameter "spark" that you passed to the function? – mxmlnlrcn Jan 20 '22 at 19:25
org.apache.spark.sql.SparkSession – Elisabetta Feb 08 '22 at 11:31
How do I mount the azure storage in azure synapse notebook ? – ss301 Apr 18 '22 at 07:19
didn't work, I got `PicklingError: Could not serialize object: RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.` – LeandroHumb Dec 16 '22 at 20:18

score -5 · Answer 2 · edited Jun 02 '20 at 19:45

-5

To access the DBUtils module in a way that works both locally and in Azure Databricks clusters, on Python, use the following get_dbutils():

def get_dbutils(spark):
  try:
    from pyspark.dbutils import DBUtils
    dbutils = DBUtils(spark)
  except ImportError:
    import IPython
    dbutils = IPython.get_ipython().user_ns["dbutils"]
  return dbutils

See: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect

edited Jun 02 '20 at 19:45

Aman Srivastava

1,007
1
13
25

answered Jun 02 '20 at 07:45

Robin van der Heijden

1
1

While I appreciate the additional explanation and reference, it’s worth acknowledging that the code suggestion is an _exact_ duplicate of @Elisabetta‘s from a several months ago. – Jeremy Caney Jun 03 '20 at 05:15

score -11 · Answer 3 · answered Oct 23 '18 at 11:17

-11

yes! You could use this:

pip install DBUtils
import DBUtils

answered Oct 23 '18 at 11:17

Parth Joshi

1
1

NameError: name 'dbutils' is not defined in pyspark

3 Answers3

Linked