Can someone explain the differences between --packages
and --jars
in a spark-submit script?
nohup ./bin/spark-submit --jars ./xxx/extrajars/stanford-corenlp-3.8.0.jar,./xxx/extrajars/stanford-parser-3.8.0.jar \
--packages datastax:spark-cassandra-connector_2.11:2.0.7 \
--class xxx.mlserver.Application \
--conf spark.cassandra.connection.host=192.168.0.33 \
--conf spark.cores.max=4 \
--master spark://192.168.0.141:7077 ./xxx/xxxanalysis-mlserver-0.1.0.jar 1000 > ./logs/nohup.out &
Also, do I require the--packages
configuration if the dependency is in my applications pom.xml
? (I ask because I just blew up my applicationon by changing the version in --packages
while forgetting to change it in the pom.xml
)
I am using the --jars
currently because the jars are massive (over 100GB) and thus slow down the shaded jar compilation. I admit I am not sure why I am using --packages
other than because I am following datastax documentation