2

I'm a complete beginner on spark. I'm trying to run spark on Amazon EC2, but my system does not recognize "spark-ec2" or "./spark-ec2". It says "spark-ec2" is not recognized as an internal or external command.

I followed the instruction here to launch a cluster. I would like to use Scala, how do I make it work?

zero323
  • 322,348
  • 103
  • 959
  • 935
Daolin
  • 614
  • 1
  • 16
  • 41
  • 2
    It doesn't look like the script is designed to run on windows. Although you can probably run the python script directly. But the problem you are having is likely due to not being in the correct directory within command line. – datasage Apr 22 '15 at 05:26
  • Do you mean in the ec2 folder? I did actually. – Daolin Apr 22 '15 at 14:53

2 Answers2

2

Add PYTHON PATH environment variable with boto. PYTHONPATH="${SPARK_EC2_DIR}/third_party/boto-2.4.1.zip/boto-2.4.1:$PYTHONPATH" And execute the python script

None
  • 1,448
  • 1
  • 18
  • 36
  • Thanks for your reply. Would this work if I need to use Scala? – Daolin Apr 22 '15 at 14:52
  • I am not sure if you can start a spark-ec2 cluster with scala. I always use the spark-ec2 script to launch a cluster. – None Apr 22 '15 at 14:55
  • PATHONPATH is not recognized as an internal or external command – Daolin Apr 22 '15 at 19:15
  • add these lines to the top of the spark-ec2.py python script. `import sys sys.path.append("/your/spark/directory/thirdparty/boto-2.4.1.zip/boto-2.4.1")` – None Apr 22 '15 at 20:16
1

In order to run the Spark-EC2 script on Windows you need Cygwin and Python. If you don't want to install these programs, you can use the dockerized version of the script (https://github.com/edrevo/spark-ec2-docker), which only depends on Docker.

edrevo
  • 1,147
  • 13
  • 17