Proper way to make a Spark Fat Jar using SBT

Question

I need a Fat Jar with Spark because I'm creating a custom node for Knime. Basically it's a self-contained jar executed inside Knime and I assume a Fat Jar is the only way to spawn a local Spark Job. Eventually we will go on submitting a job to a remote cluster but for now I need it to spawn this way.

That said, I made a Fat Jar using this: https://github.com/sbt/sbt-assembly

I made an empty sbt project, included Spark-core in the dependencies and assembled the Jar. I added it to the manifest of my custom Knime node and tried to spawn a simple job (pararellize a collection, collect it and print it). It starts but I get this error:

No configuration setting found for key 'akka.version'

I have no idea how to solve it.

Edit: this is my build.sbt

name := "SparkFatJar"

version := "1.0"

scalaVersion := "2.11.6"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.3.0"
)


libraryDependencies +=  "com.typesafe.akka" %% "akka-actor" % "2.3.8"

assemblyJarName in assembly := "SparkFatJar.jar"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

I've found this mergestrategy for Spark somewhere on the internet but I can't find the source right now.

What version of sbt-assembly are you using? Can you provide more detail, perhaps a minimal build setup that reproduces the problem? — Dale Wijnand, May 23 '15 at 16:42

score 5 · Accepted Answer · answered May 23 '15 at 17:19

5

I think the issue is with how you've setup assemblyMergeStrategy. Try this:

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case "application.conf"            => MergeStrategy.concat
  case "reference.conf"              => MergeStrategy.concat
  case x =>
    val baseStrategy = (assemblyMergeStrategy in assembly).value
    baseStrategy(x)
}

answered May 23 '15 at 17:19

Dale Wijnand

6,054
5
28
55

1

I had to use .first as default strategy but your modification worked. Then there was another problem already solved here: http://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file – Chobeat May 23 '15 at 17:46

Proper way to make a Spark Fat Jar using SBT

1 Answers1

Linked