0

I am using sbt assembly plugin to create a fat jar. I need some jars which are part of default hadoop/spark but with newer versions.

I want spark worker jvm to prefer the version that is packaged with my fat jar file and not the default hadoop/spark distribution. How can I do this?

Shubham Jain
  • 392
  • 1
  • 3
  • 15
  • Are you sure you've got the older classes in your uber-jar? What part of Spark do you want to replace with the older versions? – Jacek Laskowski Sep 18 '17 at 08:39
  • Sorry, I made a mistake in putting the question. I needed newer jars but spark comes with older versions. Now, when we submit a spark job, jvm includes jars from spark and hadoop and then from the fat jar. But, since older versions of those jars are already included from spark, newer versions that I am adding in my fat jar are discarded. I want spark to use these newer versions and discard any conflicting jars which came from default spark/hadoop distribution. In short, I want to take the jar which was added later in classpath. – Shubham Jain Sep 18 '17 at 08:56
  • What part of Spark would you like to replace? What jars are we talking about? – Jacek Laskowski Sep 18 '17 at 09:21

1 Answers1

2

The solution to this is to set spark.{driver,executor}.userClassPathFirst in configuration(--conf option) while submitting the spark application. This will first include jars from uber jar and then from spark classpath.

Other solution is to use shading in sbt assembly. And shade the jars in our uber jar whose previous version are included with spark.

Shubham Jain
  • 392
  • 1
  • 3
  • 15