0

I am having a sample project mypackg structured as below:

- mypackg
    * appcode
        * __init__.py
        * file1.py
        * file2.py
    * dbutils
        * __init__.py
        * file3.py
    * start_point.py
    * __init__.py 

The code packed into mypackg.zip

Working fine on local system testing

  • added to pyspark via sparkContext.addPyFile('path_to_zip') and ran my job
  • ran like an application via spark-submit --py-files 'path_to_zip' myjob.py

But, when I try to do the same on Databricks - I am unable to import the module

import urllib 

urllib.request.urlretrieve("https://github.com/nikhilsarma/spark_utilities/blob/master/mydata.zip", "/databricks/driver/mydata.zip")

sc = spark.sparkContext.getOrCreate() and 
sc.addPyFile('/databricks/driver/mydata.zip')
sys.path.insert(0, r'/databricks/diver/mydata.zip')
sc = spark.sparkContext.getOrCreate()
sc.addPyFile(r'/databricks/driver/mydata.zip') 

from mypackg import start_point

Error:

ModuleNotFoundError: No module named 'mypackg'

blackbishop
  • 30,945
  • 11
  • 55
  • 76
nikhilwat
  • 1
  • 1
  • I have already gone through https://stackoverflow.com/questions/51450462/pyspark-addpyfile-to-add-zip-of-py-files-but-module-still-not-found but of no help – nikhilwat Dec 22 '19 at 04:18

1 Answers1

0

It was a mistake in my uri. Instead downloading from raw/master, I was downloading from blob/master which was giving me a file I cannot use..

nikhilwat
  • 1
  • 1