0

I just started working with Pyspark on new application.. I installed with pip all the packages(dependencies of my application) that I needed in the server that run spark-submit. Do i have to install the python packages of my application on the other spark gateways?

Thanks.

Cyt0s
  • 13
  • 3
  • Possible duplicate of [importing pyspark in python shell](https://stackoverflow.com/questions/23256536/importing-pyspark-in-python-shell) – Radesh Oct 20 '18 at 13:48
  • I could not find answer to my question on this thread – Cyt0s Oct 20 '18 at 14:36

1 Answers1

1

You have to install the packages on all worker nodes. You could use cssh to make your life a bit easier.

An alternative to installing every pip package in advance is to use a requirements.txt (and preferentially an virtualenvironment). To use a requirements.txt just launch spark-submit with the following parameters:

--conf spark.pyspark.virtualenv.enabled=true  
--conf spark.pyspark.virtualenv.type=native 
--conf spark.pyspark.virtualenv.requirements=/Users/jzhang/github/spark/requirements.txt 
--conf spark.pyspark.virtualenv.bin.path=/Users/jzhang/anaconda/bin/virtualenv 
--conf spark.pyspark.python=/usr/local/bin/python3 spark_virtualenv.py

Please find further information at 2.

cronoik
  • 15,434
  • 3
  • 40
  • 78