Pyspark: python packages on the executor server

Question

I just started working with Pyspark on new application.. I installed with pip all the packages(dependencies of my application) that I needed in the server that run spark-submit. Do i have to install the python packages of my application on the other spark gateways?

Thanks.

Possible duplicate of [importing pyspark in python shell](https://stackoverflow.com/questions/23256536/importing-pyspark-in-python-shell) — Radesh, Oct 20 '18 at 13:48

cronoik · Accepted Answer · 2018-10-20T17:03:22.303

You have to install the packages on all worker nodes. You could use cssh to make your life a bit easier.

An alternative to installing every pip package in advance is to use a requirements.txt (and preferentially an virtualenvironment). To use a requirements.txt just launch spark-submit with the following parameters:

--conf spark.pyspark.virtualenv.enabled=true  
--conf spark.pyspark.virtualenv.type=native 
--conf spark.pyspark.virtualenv.requirements=/Users/jzhang/github/spark/requirements.txt 
--conf spark.pyspark.virtualenv.bin.path=/Users/jzhang/anaconda/bin/virtualenv 
--conf spark.pyspark.python=/usr/local/bin/python3 spark_virtualenv.py

Please find further information at 2.

Pyspark: python packages on the executor server

1 Answers1