0

I want my AWS EMR to have a particular branch of Spark from the git repo instead of default Spark.

The reason I want to do this is, I want to use a branch which has a fix for null value for csv.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Are you asking about Spark or the spark-csv project? – OneCricketeer Sep 12 '16 at 13:20
  • the bug is in the latest stable version of spark (2.0.0) – Vipin Yadav Sep 12 '16 at 13:26
  • And which bug is that? Can you link to the issue in the Spark issue tracker? – OneCricketeer Sep 12 '16 at 13:26
  • 1
    its already there in the spark issue tracker and the solution is already provided by a dev https://issues.apache.org/jira/browse/SPARK-16460 – Vipin Yadav Sep 12 '16 at 13:32
  • How did you currently get Spark installed? What is preventing you from checking out the pull request and compiling Spark yourself – OneCricketeer Sep 12 '16 at 13:36
  • @cricket_007 On EMR, you can install spark by configuration but each emr image has 1 version of spark. ex: emr-5.0.0 -> spark 2.0.0 the issue is that EMR has ephemeral file systems which means that each time you terminate, you'll loose all the disks content. – eliasah Sep 12 '16 at 14:55
  • It might not sure if this still works, but I have presented here a way to install custom spark on emr http://stackoverflow.com/a/32278192/3415409 – eliasah Sep 12 '16 at 14:58

0 Answers0