2

I have my spark job called like below:

    spark-submit --jar test1.jar,test2.jar \
    --class org.mytest.Students \
    --num-executors ${executors} \
    --master yarn \
    --deploy-mode cluster \
    --queue ${mapreduce.job.queuename} \
    --driver-memory ${driverMemory} \
    --conf spark.executor.memory=${sparkExecutorMemory} \
    --conf spark.rdd.compress=true \
    --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -
       XX:MaxGCPauseMillis=100 
   ${SPARK_JAR} "${INPUT}" "${OUTPUT_PATH}" 

Is is possible to pass a single jar which contain test1.jar and test2.jar . Like --jars mainTest.jar(this contain test1.jar and test2.jar) My question is basically can spark explode a jar of jars . I am using version 1.3.

Neethu Lalitha
  • 3,031
  • 4
  • 35
  • 60

2 Answers2

2

You can simply merge those jars into one shaded Jar. Please read this question: How can I create an executable JAR with dependencies using Maven?

You will have all classes in exactly one Jar. There will be no problem with nested Jars.

T. Gawęda
  • 15,706
  • 4
  • 46
  • 61
2

Question : Can spark explode a jar of jar's ?

Yes...

As T. Gaweda suggested, we can achieve with maven assembly plugin.... thought of putting other options here....


(particularly useful as it merges content of specific files instead of overwriting them. This is needed when there are resource files are have the same name across the jars and the plugin tries to package all the resource files.)

This plugin provides the capability to package the artifact in an uber-jar, including its dependencies and to shade - i.e. rename - the packages of some of the dependencies

Goals Overview The Shade Plugin has a single goal:

shade:shade is bound to the package phase and is used to create a shaded jar.

  • Option 2 :

SBT Way if you are using sbt :

source: creating-uber-jar-for-spark-project-using-sbt-assembly :

sbt-assembly is an sbt plugin to create a fat JAR of sbt project with all of its dependencies.

Add sbt-assembly plugin in project/plugin.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.1")
    Specify sbt-assembly.git as a dependency in project/project/build.scala


import sbt._

object Plugins extends Build {
  lazy val root = Project("root", file(".")) dependsOn(
    uri("git://github.com/sbt/sbt-assembly.git#0.9.1")
  )
}

In build.sbt file add the following contents

import AssemblyKeys._ // put this at the top of the file,leave the next line blank

assemblySettings

Use full keys to configure the assembly plugin. For more details refer

 target                        assembly-jar-name             test
 assembly-option               main-class                   
 full-classpath dependency-classpath          assembly-excluded-files  
 assembly-excluded-jars

If multiple files share the same relative path the default strategy is to verify that all candidates have the same contents and error out otherwise. This behaviour can be configured for Spark projects using assembly-merge-strategy as follows.

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case x => old(x)
  }
}

From the root folder run

sbt/sbt assembly

the assembly plugin then packs the class files and all the dependencies into a single JAR file

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121