5

Hadoop 2.4.0 depends on two different versions of beanutils, causing the following error with sbt-assembly:

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] .ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/beanutils/BasicDynaBean.class
[error] .ivy2/cache/commons-beanutils/commons-beanutils-core/jars/commons-beanutils-core-1.8.0.jar:org/apache/commons/beanutils/BasicDynaBean.class

Both of these dependencies are transitive from Hadoop 2.4.0, as confirmed using How to access Ivy directly, i.e. access dependency reports or execute Ivy commands?

How can I make an sbt-assembly including Hadoop 2.4.0?

UPDATE: As requested, here is the build.sbt dependencies:

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.4.0"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"  % "provided" exclude("org.apache.hadoop", "hadoop-client")

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.7.8"

libraryDependencies += "commons-io" % "commons-io" % "2.4"

libraryDependencies += "javax.servlet" % "javax.servlet-api" % "3.0.1" % "provided"

libraryDependencies += "com.sksamuel.elastic4s" %% "elastic4s" % "1.1.1.0"

The exclude hadoop is needed because, out of the box, Spark includes Hadoop 1, which conflicts with Hadoop 2.

Community
  • 1
  • 1
SRobertJames
  • 8,210
  • 14
  • 60
  • 107
  • 1
    Can you add `build.sbt` with your dependencies? – lpiepiora Jun 27 '14 at 17:38
  • @lpiepiora - Done, could you take a look? – SRobertJames Jun 29 '14 at 18:27
  • The problem is that the spark-core that is in the repo is built against Hadoop 1. Even if you solve the dependency issue you have presently, you'll get in to next problems (I have tested it). Maybe you can consider cloning Spark, and building your own version against Hadoop 2 (Spark build seems to be supporting it) – lpiepiora Jun 29 '14 at 18:31
  • @lpiepiora Thanks; Can you present the error? Are you sure that the only way to use Spark with Hadoop 2 is to build from scratch? The Spark home page http://spark.apache.org/downloads.html offers prebuilts for Hadoop 2, but only one maven example – SRobertJames Jun 29 '14 at 18:43
  • It was just an error with some other dependencies. The one you had already, but related to `hadoop-yarn-common-2.4.0.jar`. I think you could maybe resolve them all of them, but that feels like a way through dependency hell. Maybe an option for you would be to include it as a project ref, which would be automatically built from git. I've seen that spark can be downloaded as pre-built for hadoop 2, that's why I said they had a support for hadoop 2 in their build process, but I think they don't publish that version to the maven repo. – lpiepiora Jun 29 '14 at 18:50

1 Answers1

2

Try to add merge strategy to build.sbt

Like below

val meta = """META.INF(.)*""".r

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("plugin.properties") => MergeStrategy.last
    case meta(_) => MergeStrategy.discard
    case x => old(x)
  }
}
Artem N.
  • 21
  • 3