I am running an EMR notebook (plateform: AWS, notebook: jupyter, kernel: PySpark).
I need to install a .jar dependency (sparkdl
) to proceed some images.
Using Spark-submit, I can use:
spark-submit --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11
Using a local notebook, I can use:
spark = (SparkSession
.config('spark.jars.packages', 'databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11')
.getOrCreate()
)
But how to do the same thing on an EMR notebook ?
- Either I could use a bootstrap to install it on every nodes. But I don't know how to proceed…
- Either I could configure the SparkSession to use the dependency. But the notebook seems to not being able to reach the repository… Also I don't know the syntax to make it load the file copied on the S3 bucket…
EDIT: I tried
%%configure -f
{ "conf":{
"spark.jars": "s3://p8-fruits/libs/spark-deep-learning-1.5.0-spark2.4-s_2.11.jar"
}
}
This did not threw any error, but I am still not able to use it. When I try import sparkdl
, I got ModuleNotFoundError: No module named 'sparkdl'
.
Thank you very much for your help !