when i am submitting python script with spark-submit on my standalone ec2 spark cluster. i use python 2.7.9 , validated that no other python is running in cluster. - i get the following error :
ImportError: No module named numpy
i validated that numpy is working on each of the workers with
root@10:/usr/local/lib/python2.7/site-packages# python
Python 2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>
also copied manually numpy package to /usr/local/lib/python2.7/site-packages but problem persists .
update issue solved not according to the answer here . the issue was that jupyter & numpy were installed on 1 docker image on the master , and the program ran on another image without numpy installation with the python in it . so the solution was to install it as part of the docker image of the program and set the environment variables of PYSPARK_PYTHON and PYTHONPATH in it .