I have many Deduplicate found... error
when build project with SBT :
[error] Deduplicate found different file contents in the following:
[error] Jar name = netty-all-4.1.68.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
[error] Jar name = netty-handler-4.1.50.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
...
For now I consider the option with shading all libraries (as here):
libraryDependencies ++= Seq(
"com.rometools" % "rome" % "1.18.0",
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.5", // log
"ch.qos.logback" % "logback-classic" % "1.4.5", // log
"com.lihaoyi" %% "upickle" % "1.6.0", // file-io
"net.liftweb" %% "lift-json" % "3.5.0", // json
"org.apache.spark" %% "spark-sql" % "3.2.2", // spark
"org.apache.spark" %% "spark-core" % "3.2.2" % "provided", // spark
"org.postgresql" % "postgresql" % "42.5.1", // spark + postgresql
)
So that I added the following shade-rules:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.lihaoyi.**" -> "crdaa.@1")
.inLibrary("com.lihaoyi" %% "upickle" % "1.6.0")
.inProject,
ShadeRule.rename("ch.qos.logback.**" -> "crdbb.@1")
.inLibrary("ch.qos.logback" % "logback-classic" % "1.4.5")
.inProject,
ShadeRule.rename("com.typesafe.**" -> "crdcc.@1")
.inLibrary("com.typesafe.scala-logging" %% "scala-logging" % "3.9.5")
.inProject,
ShadeRule.rename("org.apache.spark.spark-sql.**" -> "crddd.@1")
.inLibrary("org.apache.spark" %% "spark-sql" % "3.2.2")
.inProject,
ShadeRule.rename("org.apache.spark.spark-core.**" -> "crdee.@1")
.inLibrary("org.apache.spark" %% "spark-core" % "3.2.2")
.inProject,
ShadeRule.rename("com.rometools.**" -> "crdff.@1")
.inLibrary("com.rometools" % "rome" % "1.18.0")
.inProject,
ShadeRule.rename("org.postgresql.postgresql.**" -> "crdgg.@1")
.inLibrary("org.postgresql" % "postgresql" % "42.5.1")
.inProject,
ShadeRule.rename("net.liftweb.**" -> "crdhh.@1")
.inLibrary("net.liftweb" %% "lift-json" % "3.5.0")
.inProject,
)
But after reloading SBT when I start assembly
I got the same errors with duplicates.
What can be problem here?
PS:
ThisBuild / scalaVersion := "2.13.10"
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0")
Update
Finally I ditched the rename in favor of unmanagedJars
+ not including spark dependencies (most of the errors were caused by them) by setting provided
option .
After that only Deduplicate-errors with module-info.class
remains, but its solution (by changing merging strategy) is described in sbt-assembly-doc.
That is, I downloaded spark separately, copied their jars into ./jarlib
directory (!!! not in ./lib
directory), changed the following in build conf:
libraryDependencies ++= Seq(
//...
"org.apache.spark" %% "spark-sql" % "3.2.3" % "provided",
"org.apache.spark" %% "spark-core" % "3.2.3" % "provided",
)
unmanagedJars in Compile += file("./jarlib")
ThisBuild / assemblyMergeStrategy := {
case PathList("module-info.class") => MergeStrategy.discard
case x if x.endsWith("/module-info.class") => MergeStrategy.discard
case x =>
val oldStrategy = (ThisBuild / assemblyMergeStrategy).value
oldStrategy(x)
}
Spark-jars have been included in final jar
Update 2
As noted in comments unmanagedJars
are useless in that case - so I removed unmanagedJars
string from build.sbt
Noted Spark-jars
which aren't included in final jar-file should be in class-path when you start jar
.
In my case I copied Spark-jars
+ final jar
to folder ./app
and start jar
by:
java -cp "./app/*" main.Main
... where main.Main
is main-class.