To get the notebook working, you'll really want the notebook setup to pick up the right packages itself. Since the initialization action you linked works to ensure Jupyter will be using the cluster's configured Spark directories and thus pick up all the necessary YARN/filesystem/lib configurations, the best way to do this is to add the property at cluster-creation time instead of job-submission time:
gcloud dataproc clusters create \
--properties spark:spark.jars.packages=com.databricks:spark-csv_2.11:1.2.0
Per this StackOverflow error, setting the spark-defaults.conf
property spark.jars.packages
is the more portable equivalent of specifying the --packages
option, since --packages
is just syntactic sugar in spark-shell/spark-submit/pyspark wrappers which sets the spark.jars.packages
configuration entry anyways.