9

I want to know is it possible to change the hadoop version when the cluster is created by spark-ec2?

I tried

spark-ec2 -k spark -i ~/.ssh/spark.pem -s 1 launch my-spark-cluster

then I login with

spark-ec2 -k spark -i ~/.ssh/spark.pem login my-spark-cluster

and found out the hadoop version is 1.0.4.

I want to use 2.x version of hadoop, what's the best way to do configure this?

zero323
  • 322,348
  • 103
  • 959
  • 935
user3684014
  • 1,175
  • 12
  • 26
  • I don't think this is currently supported, though there is an [open PR to add support for launching Hadoop 2 clusters](https://github.com/mesos/spark-ec2/pull/77). – Nick Chammas Feb 11 '15 at 01:10

1 Answers1

8

Hadoop 2.0

spark-ec2 script doesn't support modifying existing cluster but you can create a new Spark cluster with Hadoop 2.

See this excerpt from the script's --help:

  --hadoop-major-version=HADOOP_MAJOR_VERSION
                    Major version of Hadoop (default: 1)

So for example:

spark-ec2 -k spark -i ~/.ssh/spark.pem -s 1 --hadoop-major-version=2 launch my-spark-cluster

..will create you a cluster using current version of Spark and Hadoop 2.


If you use Spark v. 1.3.1 or Spark v. 1.4.0 and will create a standalone cluster then you will get Hadoop v. 2.0.0 MR1 (from Cloudera Hadoop Platform 4.2.0 distribution) this way.


The caveats are:

..but I have successfully used a few clusters of Spark 1.2.0 and 1.3.1 created with Hadoop 2.0.0, using some Hadoop2-specific features. (for Spark 1.2.0 with a few tweaks, that I have put in my forks of Spark and spark-ec2, but that's another story.)


Hadoop 2.4, 2.6

If you need Hadoop 2.4 or Hadoop 2.6 then I would currently (as of June 2015) recommend you to create a standalone cluster manually - it's easier than you probably think.

Greg Dubicki
  • 5,983
  • 3
  • 55
  • 68
  • Hello Greg, are you still not recommending using spark-ec2 scripts to launch hadoop 2.6 ? – dirceusemighini Mar 10 '16 at 14:28
  • Hi @dirceusemighini! I haven't been working on it since June 2015, so Spark v. 1.4.0 and as of March 2016 we have v. 1.6.1 stable, so unfortunately I don't have an up to date opinion as of now. – Greg Dubicki Mar 11 '16 at 10:08