5

I am running pyspark from an Azure Machine Learning notebook. I am trying to move a file using the dbutil module.

from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    def get_dbutils(spark):
        try:
            from pyspark.dbutils import DBUtils
            dbutils = DBUtils(spark)
        except ImportError:
            import IPython
            dbutils = IPython.get_ipython().user_ns["dbutils"]
        return dbutils

    dbutils = get_dbutils(spark)
    dbutils.fs.cp("file:source", "dbfs:destination")

I got this error: ModuleNotFoundError: No module named 'pyspark.dbutils' Is there a workaround for this?

Here is the error in another Azure Machine Learning notebook:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-183f003402ff> in get_dbutils(spark)
      4         try:
----> 5             from pyspark.dbutils import DBUtils
      6             dbutils = DBUtils(spark)

ModuleNotFoundError: No module named 'pyspark.dbutils'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1-183f003402ff> in <module>
     10         return dbutils
     11 
---> 12 dbutils = get_dbutils(spark)

<ipython-input-1-183f003402ff> in get_dbutils(spark)
      7         except ImportError:
      8             import IPython
----> 9             dbutils = IPython.get_ipython().user_ns["dbutils"]
     10         return dbutils
     11 

KeyError: 'dbutils'
Anders Swanson
  • 3,637
  • 1
  • 18
  • 43
Jeanne Lane
  • 495
  • 1
  • 9
  • 26
  • 2
    Have you tried pip install six – Shubham Jain May 01 '20 at 17:46
  • 1
    Requirement already satisfied: six in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (1.12.0) – Jeanne Lane May 01 '20 at 17:50
  • 2
    Hi @JeanneLane, where you are executing this notebook. Are running notebook using databricks connect? Could you please share the complete stacktrace of the error message? – CHEEKATLAPRADEEP May 04 '20 at 11:47
  • I am running the notebook on a vm created under Compute in Azure Machine Learning. It is STANDARD_DS3_v2 size. – Jeanne Lane May 04 '20 at 15:45
  • Here's the full error: --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) in get_dbutils(spark) 4 try: ----> 5 from pyspark.dbutils import DBUtils 6 dbutils = DBUtils(spark) ModuleNotFoundError: No module named 'pyspark.dbutils' – Jeanne Lane May 04 '20 at 15:46

1 Answers1

1

Apparently something changed in databricks runtime. So your code used to work, but not anymore.

Two options:

  1. Catch the correct Error
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
def get_dbutils(spark):
    try:
        from pyspark.dbutils import DBUtils
        dbutils = DBUtils(spark)
    except ModuleNotFoundError:        # <-- changed from ImportError
        import IPython
        dbutils = IPython.get_ipython().user_ns["dbutils"]
    return dbutils

dbutils = get_dbutils(spark)
dbutils.fs.cp("file:source", "dbfs:destination")
  1. Follow new instructions.
def get_dbutils(spark):
  if spark.conf.get("spark.databricks.service.client.enabled") == "true":
    from pyspark.dbutils import DBUtils
    return DBUtils(spark)
  else:
    import IPython
    return IPython.get_ipython().user_ns["dbutils"]

If you're trying to do this for unit testing and need to mock it, then check out this: https://stackoverflow.com/a/76018686/496289

Kashyap
  • 15,354
  • 13
  • 64
  • 103
  • Thank you for your answer. I was trying to use dbutils outside of Databricks, and it can't be used that way as of now. https://community.databricks.com/s/question/0D58Y000095wGDRSA2/install-dbutils-locally – Jeanne Lane Apr 17 '23 at 17:53