How to install a custom spark branch on EMR?

Asked Sep 12 '16 at 13:05

Active Sep 12 '16 at 13:19

Viewed 261 times

I want my AWS EMR to have a particular branch of Spark from the git repo instead of default Spark.

The reason I want to do this is, I want to use a branch which has a fix for null value for csv.

edited Sep 12 '16 at 13:19

OneCricketeer

asked Sep 12 '16 at 13:05

Vipin Yadav

Are you asking about Spark or the spark-csv project? – OneCricketeer Sep 12 '16 at 13:20
the bug is in the latest stable version of spark (2.0.0) – Vipin Yadav Sep 12 '16 at 13:26
And which bug is that? Can you link to the issue in the Spark issue tracker? – OneCricketeer Sep 12 '16 at 13:26
1

its already there in the spark issue tracker and the solution is already provided by a dev https://issues.apache.org/jira/browse/SPARK-16460 – Vipin Yadav Sep 12 '16 at 13:32
How did you currently get Spark installed? What is preventing you from checking out the pull request and compiling Spark yourself – OneCricketeer Sep 12 '16 at 13:36
@cricket_007 On EMR, you can install spark by configuration but each emr image has 1 version of spark. ex: emr-5.0.0 -> spark 2.0.0 the issue is that EMR has ephemeral file systems which means that each time you terminate, you'll loose all the disks content. – eliasah Sep 12 '16 at 14:55
It might not sure if this still works, but I have presented here a way to install custom spark on emr http://stackoverflow.com/a/32278192/3415409 – eliasah Sep 12 '16 at 14:58

0 Answers0