I have a pyspark project with a python script which runs spark-streaming. I've got some external dependencies which I run with --packages
flag.
However, in scala, we can use maven to download all required packages, make a jar file with the main spark program and have everything in one jar and then just use spark-submit
to submit it to the cluster (yarn in my case).
Is there any such similar things as jar
for pyspark?
There is no such information on the official documentation of spark. They just mention use spark-submit <python-file>
or add --py-files
but it isn't as professional as a jar
file.
Any suggestion would be helpful! Thanks!