I am having a sample project mypackg structured as below:
- mypackg
* appcode
* __init__.py
* file1.py
* file2.py
* dbutils
* __init__.py
* file3.py
* start_point.py
* __init__.py
The code packed into mypackg.zip
Working fine on local system testing
- added to pyspark via
sparkContext.addPyFile('path_to_zip')
and ran my job - ran like an application via
spark-submit --py-files 'path_to_zip' myjob.py
But, when I try to do the same on Databricks - I am unable to import the module
import urllib
urllib.request.urlretrieve("https://github.com/nikhilsarma/spark_utilities/blob/master/mydata.zip", "/databricks/driver/mydata.zip")
sc = spark.sparkContext.getOrCreate() and
sc.addPyFile('/databricks/driver/mydata.zip')
sys.path.insert(0, r'/databricks/diver/mydata.zip')
sc = spark.sparkContext.getOrCreate()
sc.addPyFile(r'/databricks/driver/mydata.zip')
from mypackg import start_point
Error:
ModuleNotFoundError: No module named 'mypackg'