0

I have many Deduplicate found... error when build project with SBT :

[error] Deduplicate found different file contents in the following:
[error]   Jar name = netty-all-4.1.68.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
[error]   Jar name = netty-handler-4.1.50.Final.jar, jar org = io.netty, entry target = io/netty/handler/ssl/SslProvider.class
...

For now I consider the option with shading all libraries (as here):

libraryDependencies ++= Seq(
  "com.rometools" % "rome" % "1.18.0",
  "com.typesafe.scala-logging" %% "scala-logging" % "3.9.5", // log
  "ch.qos.logback" % "logback-classic" % "1.4.5", // log
  "com.lihaoyi" %% "upickle" % "1.6.0", // file-io

  "net.liftweb" %% "lift-json" % "3.5.0", // json
  "org.apache.spark" %% "spark-sql" % "3.2.2", // spark
  "org.apache.spark" %% "spark-core" % "3.2.2" % "provided", // spark
  "org.postgresql" % "postgresql" % "42.5.1", // spark + postgresql

)

So that I added the following shade-rules:

assemblyShadeRules in assembly := Seq(

    ShadeRule.rename("com.lihaoyi.**" -> "crdaa.@1")
  .inLibrary("com.lihaoyi" %% "upickle" % "1.6.0")
  .inProject,

      ShadeRule.rename("ch.qos.logback.**" -> "crdbb.@1")
  .inLibrary("ch.qos.logback" % "logback-classic" % "1.4.5")
  .inProject,

        ShadeRule.rename("com.typesafe.**" -> "crdcc.@1")
  .inLibrary("com.typesafe.scala-logging" %% "scala-logging" % "3.9.5")
  .inProject,

  ShadeRule.rename("org.apache.spark.spark-sql.**" -> "crddd.@1")
  .inLibrary("org.apache.spark" %% "spark-sql" % "3.2.2")
  .inProject,

   ShadeRule.rename("org.apache.spark.spark-core.**" -> "crdee.@1")
  .inLibrary("org.apache.spark" %% "spark-core" % "3.2.2")
  .inProject,

     ShadeRule.rename("com.rometools.**" -> "crdff.@1")
  .inLibrary("com.rometools" % "rome" % "1.18.0")
  .inProject,

       ShadeRule.rename("org.postgresql.postgresql.**" -> "crdgg.@1")
  .inLibrary("org.postgresql" % "postgresql" % "42.5.1")
  .inProject,

  ShadeRule.rename("net.liftweb.**" -> "crdhh.@1")
    .inLibrary("net.liftweb" %% "lift-json" % "3.5.0")
    .inProject,
)

But after reloading SBT when I start assembly I got the same errors with duplicates.

What can be problem here?

PS:

ThisBuild / scalaVersion := "2.13.10"
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0")

Update

Finally I ditched the rename in favor of unmanagedJars + not including spark dependencies (most of the errors were caused by them) by setting provided option .

After that only Deduplicate-errors with module-info.class remains, but its solution (by changing merging strategy) is described in sbt-assembly-doc.

That is, I downloaded spark separately, copied their jars into ./jarlib directory (!!! not in ./lib directory), changed the following in build conf:

libraryDependencies ++= Seq(
  //...
  "org.apache.spark" %% "spark-sql" % "3.2.3" % "provided", 
  "org.apache.spark" %% "spark-core" % "3.2.3" % "provided", 
)

unmanagedJars in Compile += file("./jarlib")


ThisBuild / assemblyMergeStrategy := {
  case PathList("module-info.class") => MergeStrategy.discard
  case x if x.endsWith("/module-info.class") => MergeStrategy.discard
  case x =>
    val oldStrategy = (ThisBuild / assemblyMergeStrategy).value
    oldStrategy(x)
}

Spark-jars have been included in final jar

Update 2

As noted in comments unmanagedJars are useless in that case - so I removed unmanagedJars string from build.sbt

Noted Spark-jars which aren't included in final jar-file should be in class-path when you start jar.

In my case I copied Spark-jars + final jar to folder ./app and start jar by:

java -cp "./app/*" main.Main

... where main.Main is main-class.

palandlom
  • 529
  • 6
  • 17
  • 1
    For example classes from `netty-all` collide with the ones from `netty-handler` (package `io.netty.handler.ssl`), `netty-transport` (package `io.netty.channel.epoll`), `netty-common` (package `io.netty.util.internal`) etc. How do your renaming rules help with that? – Dmytro Mitin Dec 31 '22 at 02:11
  • 1
    Why did you decide to do shading rather than set up `assemblyMergeStrategy`? Doesn't it work for you? – Dmytro Mitin Dec 31 '22 at 02:13
  • This is my first time I work with `sbt` - so I don't know best practices + there were a many errors so I didn't know how to define rules effectively + in some ecamples I didn't understand some syntax i.e. ``` case PathList(ps@_*) if ps.last endsWith "StaticMDCBinder.class" => MergeStrategy.first ``` Can you explain me please or give a link to what `ps@_*` is mean? _* - is clear but @ confused me – palandlom Jan 02 '23 at 10:05
  • `@` is binding a variable upon pattern matching https://stackoverflow.com/questions/74928495 https://stackoverflow.com/questions/42907863 https://stackoverflow.com/questions/11284048 https://stackoverflow.com/questions/31085742 https://stackoverflow.com/questions/42155053 – Dmytro Mitin Jan 02 '23 at 15:08
  • 1
    I thought you tried sbt-assembly and there were reasons why you needed shading. I guess you could try `MergeStrategy.first` or `MergeStrategy.last` if you are ok with sbt-assembly (maybe also `MergeStrategy.discard` for files like `module-info.class` if you need this). There should not be need in unmanaged jars then. Although there can be difficulties with assembly jar sometimes too https://stackoverflow.com/questions/74881933 https://stackoverflow.com/questions/74800073 https://stackoverflow.com/questions/74809158 https://stackoverflow.com/questions/74879217 – Dmytro Mitin Jan 02 '23 at 15:23
  • `_*` is vararg syntax https://stackoverflow.com/questions/31064753/how-to-pass-scala-array-into-scala-vararg-method https://stackoverflow.com/questions/6051302/what-does-colon-underscore-star-do-in-scala https://stackoverflow.com/questions/37088903/scala-pattern-matching-for-vararg – Dmytro Mitin Jan 02 '23 at 15:26
  • https://stackoverflow.com/questions/25144484/sbt-assembly-deduplication-found-error – Dmytro Mitin Jan 14 '23 at 03:43

1 Answers1

0

Sometimes like this (put in your build.sbt) is how you typically remove the deduplication that comes when your libraries have overlapping libraries of their own:

assemblyMergeStrategy in assembly := {
  case PathList("javax", "activation", _*) => MergeStrategy.first
  case PathList("com", "sun", _*) => MergeStrategy.first
  case "META-INF/io.netty.versions.properties" => MergeStrategy.first
  case "META-INF/mime.types" => MergeStrategy.first
  case "META-INF/mailcap.default" => MergeStrategy.first
  case "META-INF/mimetypes.default" => MergeStrategy.first
  case d if d.endsWith(".jar:module-info.class") => MergeStrategy.first
  case d if d.endsWith("module-info.class") => MergeStrategy.first
  case d if d.endsWith("/MatchersBinder.class") => MergeStrategy.discard
  case d if d.endsWith("/ArgumentsProcessor.class") => MergeStrategy.discard                                           
  case x =>                                                                                                          
    val oldStrategy = (assemblyMergeStrategy in assembly).value                                                                    
    oldStrategy(x)                                                                            
}
Philluminati
  • 2,649
  • 2
  • 25
  • 32