I am confused about the behavior of spark-submit and the online docs are not leading me to the answer. Say I have a Python driver program that depends on some additional modules I have written. In order to work I have to zip those modules up and include the zipped file with my spark-submit command. Do those modules get distributed to each node in the cluster?
Asked
Active
Viewed 430 times
0
-
My understanding is yes, they are copied to each node. Unfortunately, I cannot provide any documentation to back this up. I will add that you don't _have_ to zip the files, you can also specify `.py` files (I believe comma separated, but it could be space) in the `--py-files` argument. – pault Jan 10 '18 at 20:40
-
Possible duplicate of [shipping python modules in pyspark to other nodes?](https://stackoverflow.com/q/24686474/6910411) – zero323 Jan 10 '18 at 22:45