23

I was trying to run the below code in pyspark.

dbutils.widgets.text('config', '', 'config')

It was throwing me an error saying

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 NameError: name 'dbutils' is not defined

so, Is there any way I can run it in pyspark by including the databricks package ,like an import ?

Your help is appreciated

Babu
  • 861
  • 3
  • 13
  • 36
  • In a package/module I have `from pyspark.dbutils import DBUtils` and `def get_secerts(dbutils: DBUtils):` Then you can use `dbutils.secrets.get()` as you would in a notebook. – Jari Turkia Jun 09 '21 at 07:19

5 Answers5

14

As of databricks runtime v3.0 the answer provided by pprasad009 above no longer works. Now use the following:

def get_db_utils(spark):

      dbutils = None
      
      if spark.conf.get("spark.databricks.service.client.enabled") == "true":
        
        from pyspark.dbutils import DBUtils
        dbutils = DBUtils(spark)
      
      else:
        
        import IPython
        dbutils = IPython.get_ipython().user_ns["dbutils"]
      
      return dbutils

See: https://learn.microsoft.com/en-gb/azure/databricks/dev-tools/databricks-connect#access-dbutils

Chris
  • 474
  • 3
  • 7
10

as explained in https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html#access-dbutils

depending on where you are executing your code directly on databricks server (eg. using databricks notebook to invoke your project egg file) or from your IDE using databricks-connect you should initialize dbutils as below. (where spark is your SparkSession)

def get_dbutils(spark):
    try:
        from pyspark.dbutils import DBUtils
        dbutils = DBUtils(spark)
    except ImportError:
        import IPython
        dbutils = IPython.get_ipython().user_ns["dbutils"]
    return dbutils

dbutils = get_dbutils(spark)
pprasad009
  • 508
  • 6
  • 9
1

In Scala you can

import com.databricks.dbutils_v1.DBUtilsHolder.dbutils

And follow below links for more dependency..

https://docs.databricks.com/user-guide/dev-tools/dbutils.html

1

If you look at the sources of dbutils.py (from databricks-connect==11.3.11 ) it is no longer necessary to check

if spark.conf.get("spark.databricks.service.client.enabled") == "true":
   ...

This is done in the class DBUtils. So the code below will do the same as stated in the other answers:

from pyspark.dbutils import DBUtils

spark = SparkSession.builder.getOrCreate()
dbutils = DBUtils(spark)

You can also see there that the remote solution (spark.databricks.service.client.enabled==true) of dbutils only supports fs and secrets.

HeyMan
  • 1,529
  • 18
  • 32
-2

I am assuming that you want the code to be run on databricks cluster. If so, then there is no need to import any package as Databricks by default includes all the necessary libraries for dbutils.

I tried using it on databricks (python/scala) notebook without importing any libraries and it works fine.

enter image description here

Ritesh
  • 1,030
  • 2
  • 12
  • 28
  • Yes Ritesh,but I dont have databricks cluster . So ,just finding an alternative to import packages. – Babu Aug 17 '18 at 13:07
  • As per my knowledge, you have to run your code on databricks cluster if you wish to use dbutils. Please let me know if you find any alternative. – Ritesh Aug 19 '18 at 06:11