3

I am facing the exact issue as described in the below post and the suggested answer is not helping. sbt-assembly: deduplication found error

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.transaction\orbits\javax.transaction-1.1.1.v201105210645.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.servlet\orbits\javax.servlet-3.0.0.v201112011016.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.mail.glassfish\orbits\javax.mail.glassfish-1.4.1.v201005082020.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.activation\orbits\javax.activation-1.1.0.v201105071233.jar:META-INF/ECLIPSEF.RSA
[error] Total time: 14 s, completed Sep 9, 2014 5:21:01 PM

my build.sbt file contains

name := "Simple"

version := "0.1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
  "org.twitter4j" % "twitter4j-stream" % "3.0.3"
)

//libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.2"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.2"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.0.2"

libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "0.4.2"

libraryDependencies ++= Seq(
    ("org.apache.spark"%%"spark-core"%"1.0.2").
    exclude("org.eclipse.jetty.orbit", "javax.servlet").
    exclude("org.eclipse.jetty.orbit", "javax.transaction").
    exclude("org.eclipse.jetty.orbit", "javax.mail").
    exclude("org.eclipse.jetty.orbit", "javax.activation").
    exclude("commons-beanutils", "commons-beanutils-core").
    exclude("commons-collections", "commons-collections").
    exclude("commons-collections", "commons-collections").
    exclude("com.esotericsoftware.minlog", "minlog")
)

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

    mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
    {
        case PathList("javax", "servlet", xs @ _*)         => MergeStrategy.first
        case PathList("javax", "transaction", xs @ _*)     => MergeStrategy.first
        case PathList("javax", "mail", xs @ _*)     => MergeStrategy.first
        case PathList("javax", "activation", xs @ _*)     => MergeStrategy.first
        case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
        case "application.conf" => MergeStrategy.concat
        case "unwanted.txt"     => MergeStrategy.discard
        case x => old(x)
        }
    }

Any pointers on how to fix the above issue?

Community
  • 1
  • 1
Siva
  • 1,839
  • 5
  • 21
  • 31

2 Answers2

3

If you are planning to run your program from Spark, then I strongly recommend to add all Spark dependencies as provided so they will be excluded from assembly task.

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core"              % "1.0.2" % "provided",
  "org.apache.spark" %% "spark-streaming"         % "1.0.2" % "provided",
  "org.apache.spark" %% "spark-streaming-twitter" % "1.0.2" % "provided")

In the other case, you need to either remove those jars from classpath or to add appropriate lines to mergeStrategy, in your case it would be

case PathList("META-INF", "ECLIPSEF.RSA") => MergeStrategy.first

If you still wish to deal with Spark's dependencies hell, sbt-dependency-graph plugin should help. Also note that other Spark dependencies, like spark-streaming and spark-streaming-twitter probably needs exclude directive too.

4e6
  • 10,696
  • 4
  • 52
  • 62
  • could you please elaborate on this " If you are planning to run your program from Spark, then I strongly recommend to add all Spark dependencies as provided." How do i add them? – Siva Sep 12 '14 at 06:16
  • @Siva It means that in case when Spark is running, these jars are already available when you deploying your job and there's no need to ship them with application. See my updated answer. – 4e6 Sep 12 '14 at 07:41
  • 1
    And when i tried to add the above merge strategy, it resulted on one more error, [error] C:\Users\xxx\.ivy2\cache\com.esotericsoftware.kryo\kryo\bundles\kryo-2.21.jar:com/esotericsoftware/minlog/Log$Logger.class error] C:\Users\xxx\.ivy2\cache\com.esotericsoftware.minlog\minlog\jars\minlog-1.2.jar:com/esotericsoftware/minlog/Log$Logger.class – Siva Sep 12 '14 at 08:02
  • Problem with this is it goes against the whole point of assembly ... that is to build a fat jar where worrying about classpath is not necessary. – samthebest Nov 19 '14 at 12:31
0

So in order to get the annoying "deduplicate" messages to go away I didn't bother with the exclude stuff, it didn't seem to help with me. I copied and pasted the defaultMergeStrategy from the sbt code and just changed the line where it says deduplicate to first. I also had to add a catch all at the end to insist upon first too. To be honest I have no idea what this means, or why it makes the annoying messages go away ... I don't have time to get a Phd in sbt, I want my code to just build!! So the merge strat becomes:

mergeStrategy in assembly <<= (mergeStrategy in assembly) ((old) => {
  case x if Assembly.isConfigFile(x) =>
    MergeStrategy.concat
  case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
    MergeStrategy.rename
  case PathList("META-INF", xs @ _*) =>
    (xs map {_.toLowerCase}) match {
      case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
        MergeStrategy.discard
      case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
        MergeStrategy.discard
      case "plexus" :: xs =>
        MergeStrategy.discard
      case "services" :: xs =>
        MergeStrategy.filterDistinctLines
      case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
        MergeStrategy.filterDistinctLines
      case _ => MergeStrategy.first // Changed deduplicate to first
    }
  case PathList(_*) => MergeStrategy.first // added this line
})
samthebest
  • 30,803
  • 25
  • 102
  • 142