4

Is there a way to specify the AWS AMI with particular OS (say Ubuntu) when launching Spark on Amazon cloud with provided scripts?

What is the default AMI, operating system that is launched by EC-2 script? Is it eligible for "Free Tier" program by AWS?

Oleg Shirokikh
  • 3,447
  • 4
  • 33
  • 61
  • can you be more specific about the scripts you're referring too ? Are they provided by amazon ? developed by you ? Can we have a look at these scripts ? – Sébastien Stormacq Feb 15 '15 at 09:11
  • 1
    @SébastienStormacq I mean standard Apache Spark EC-2 scripts shipped along with Spark distribution – Oleg Shirokikh Feb 15 '15 at 09:43
  • I set up an EC2 cluster with their default AMI, it has used x86_64 x86_64 x86_64 GNU/Linux as the default OS. Not sure about the free tier thing though – Harsha Feb 15 '15 at 23:45

1 Answers1

5

The script (spark_ec2.py) takes the AMI list from https://github.com/mesos/spark-ec2/tree/branch-1.3/ami-list by default. You can override it by creating a fork with your preferences and selecting it with --spark-ec2-git-repo and --spark-ec2-git-branch.

Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • But are there any ubuntu AMI's for spark available? The existing amazon linux has older stuff. Just one example: no availability of php5.4+ – WestCoastProjects Feb 20 '15 at 19:55
  • You could try it with an AMI of your choice. I'm 99% sure the Spark AMIs do not include Spark. I think they just serve as a known basic Linux. (I haven't tried overriding the default AMI.) If that does not work, you can run Spark in standalone or YARN mode on whatever EC2 node you have. I've tried both these options and they work fine. Another option is to use the Spark AMI but upgrade some packages that you need. – Daniel Darabos Feb 20 '15 at 20:20
  • One can not simply upgrade the packages: amazon linux does not have packaging support except for limited sets. I looked into the spark ec2 scripts: they do NOT include code for building spark on top of the AMI. that would have been my first choice. Instead it seems the AMI's DO include spark inside them. – WestCoastProjects Feb 20 '15 at 20:27
  • I'm still almost entirely sure the AMIs do not include Spark. The AMI list has been unchanged for more than a year, yet if you run the brand new Spark 1.2.1 `spark_ec2.py` you get a cluster with Spark 1.2.1. – Daniel Darabos Feb 20 '15 at 20:30
  • The AMI may have a script embedded to get most recent version- not sure. But in any case I did not see in spark_ec2.py how (if at all possible) to customize the AMI after it is launched. Customizations would include "install spark". In EMR this is called bootstrap actions. Where are they? I'd be happy to use them. – WestCoastProjects Feb 20 '15 at 20:37
  • I looked again at the spark_ec2.py script. It appears I missed there are sections for setting up spark master and slaves. They are after the EC2 setup logic. I do have a follow on question - which vanilla ubuntu or centos ami's to use .. but I understand if that were considered a separate question. In any case i have upvoted your answer. – WestCoastProjects Feb 20 '15 at 20:42
  • If you don't want anything special I'd first try something from the default AMI list offered by Amazon (i.e. https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#LaunchInstanceWizard:). But I'm pretty clueless about this :). Good luck! – Daniel Darabos Feb 20 '15 at 20:46
  • 1
    Don't know if this can help, but here is a pull request stating that access to ubuntu AMIs through `spark-ec2` is still under construction : https://github.com/amplab/spark-ec2/pull/49 – Pierre Cordier Mar 16 '17 at 18:23