1

when i am submitting python script with spark-submit on my standalone ec2 spark cluster. i use python 2.7.9 , validated that no other python is running in cluster. - i get the following error :

ImportError: No module named numpy

i validated that numpy is working on each of the workers with

root@10:/usr/local/lib/python2.7/site-packages# python
Python 2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>

also copied manually numpy package to /usr/local/lib/python2.7/site-packages but problem persists .

update issue solved not according to the answer here . the issue was that jupyter & numpy were installed on 1 docker image on the master , and the program ran on another image without numpy installation with the python in it . so the solution was to install it as part of the docker image of the program and set the environment variables of PYSPARK_PYTHON and PYTHONPATH in it .

sparkly
  • 71
  • 1
  • 2
  • 6
  • Can you import `numpy` on the driver as well ? – Alex May 14 '18 at 12:42
  • yes: root@ip-10-0-5-253:/usr/spark-2.3.0# python Python 2.7.9 (default, Jun 29 2016, 13:08:31) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> – sparkly May 14 '18 at 12:44
  • update - it works from jupyter notebook , that is installed on the master, and works against the cluster . checked with : import numpy from pyspark.mllib.fpm import PrefixSpan data = [[["a", "b"], ["c"]],[["a"], ["c", "b"], ["a", "b"]],[["a", "b"], ["e"]],[["f"]]] rdd = sc.parallelize(data) model = PrefixSpan.train(rdd, minSupport=0.1) result = model.freqSequences().filter(lambda x: (x.freq >= 2)).filter(lambda x: (len(x.sequence) >=2) ).cache() result.collect() spark-submit still fails ...... :-( – sparkly May 14 '18 at 12:59
  • Possible duplicate of [pyspark import user defined module or .py files](https://stackoverflow.com/questions/43532083/pyspark-import-user-defined-module-or-py-files) – Steven May 14 '18 at 15:16
  • i am running in cluster mode , so i don't see it is related – sparkly May 14 '18 at 17:03

0 Answers0