11

The title could also be:
What are the differences between Maven and SBT assembly plugins.

I have found this to be an issue, while migrating a project from Maven to SBT.

To describe the problem I have created an example project with dependencies that I found to behave differently, depending on the build tool.

https://github.com/atais/mvn-sbt-assembly


The only dependencies are (sbt style)

"com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
"org.apache.cassandra" % "cassandra-all" % "3.4",

and what I do not understand is, why mvn package creates the fat jar successfully, while sbt assembly gives conflicts:

[error] 39 errors were encountered during merge
[error] java.lang.RuntimeException: deduplicate: different file contents found in the following:
[error] /home/siatkowskim/.ivy2/cache/org.slf4j/jcl-over-slf4j/jars/jcl-over-slf4j-1.7.7.jar:org/apache/commons/logging/<some classes>
[error] /home/siatkowskim/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:org/apache/commons/logging/<some classes>
...
[error] /home/siatkowskim/.ivy2/cache/com.github.stephenc.high-scale-lib/high-scale-lib/jars/high-scale-lib-1.1.2.jar:org/cliffc/high_scale_lib/<some classes>
[error] /home/siatkowskim/.ivy2/cache/com.boundary/high-scale-lib/jars/high-scale-lib-1.0.6.jar:org/cliffc/high_scale_lib/<some classes>
...
Atais
  • 10,857
  • 6
  • 71
  • 111
  • You need to have merge strategy https://stackoverflow.com/questions/32497280/how-to-get-sbt-assembly-merge-right, https://stackoverflow.com/questions/39850368/sbt-assembly-error-encontered-during-merge, https://stackoverflow.com/questions/14791955/assembly-merge-strategy-issues-using-sbt-assembly, For maven see this https://stackoverflow.com/questions/44612786/how-java-maven-resolves-dependency-conflicts-at-run-time, also see this https://bryantsai.com/how-to-resolve-dependency-conflict-out-of-your-control-e75ace79e54f – Tarun Lalwani May 15 '18 at 13:52
  • @TarunLalwani your last link (the article) quite well describes the case, but the thing is, that in case of conflicts, they recommend `maven-shade-plugin`. And this is where things are getting interesting. Because in my example projects, there ARE conflicts, but somehow `maven-assembly-plugin` resolves them, and `sbt-assembly` does not. – Atais May 16 '18 at 11:08
  • I tried finding any reference on the same but I didn't find any such thing which describe how maven shade plugin does it – Tarun Lalwani May 16 '18 at 11:18
  • sbt-assembly can do shading as well: https://github.com/sbt/sbt-assembly#shading – Pietrotull May 23 '18 at 19:58
  • What a coincidence as I've been running into it too and spent over a week trying to figure it out. Glad you asked this question. – Jacek Laskowski Feb 18 '19 at 14:25

3 Answers3

5

Extension to Alexey Romanov answer.

I have also updated my project with detailed explanation, so you might want to check it out.

Following the advice

You can verify it for this case by unpacking the jar Maven produces and the dependency jars in SBT error message, then checking which .class file Maven used.

I compared the fat-jars produced by maven and sbt with

  • MergeStrategy.first, that showed some extra files
  • MergeStrategy.last, that showed binary differences & extra files

I have taken the next step and checked the fat-jars against the dependencies sbt found conflicts at, specifically:

Conclusion

maven-assembly-plugin resolves conflicts on jar level. When it finds any conflict, it picks the first jar and simply ignores all the content from the other.

Whereas sbt-assembly mixes all the class files, resolving conflicts locally, file by file.

My theory would be, that if your fat-jar made with maven-assembly-plugin works, you can specify MergeStrategy.first for all the conflicts in sbt. They only difference would be, that the jar produced with sbt will be even bigger, containing extra classes that were ignored by maven.

Atais
  • 10,857
  • 6
  • 71
  • 111
4

It seems maven-assembly-plugin resolves conflicts equivalently to MergeStrategy.first (not sure if it's completely equivalent) by just picking one of the files in an unspecified way when jar-with-dependencies is used (since it only has one phase):

If two or more elements (e.g., file, fileSet) select different sources for the same file for archiving, only one of the source files will be archived.

As per version 2.5.2 of the assembly plugin, the first phase to add the file to the archive "wins". The filtering is done solely based on name inside the archive, so the same source file can be added under different output names. The order of the phases is as follows: 1) FileItem 2) FileSets 3) ModuleSet 4) DepenedencySet and 5) Repository elements.

Elements of the same type will be processed in the order they appear in the descriptors. If you need to "overwrite" a file included by a previous set, the only way to do this is to exclude that file from the earlier set.

Note that this behaviour was slightly different in earlier versions of the assembly plugin.

Even if one of the conflicting files would work for all of your dependencies (which isn't necessarily so), Maven doesn't know which one, so you can just silently get the wrong result. Silently at build-time, I mean; at runtime you can get e.g. AbstractMethodError, or again just a wrong result.

You can influence which file gets picked by writing your own descriptor, but it's horribly verbose, there's no equivalent to just writing MergeStrategy.first/last (and concat/discard are not allowed).

The SBT plugin could do the same: default to a strategy when you don't specify one, but then, well, you could silently get the wrong result.

Community
  • 1
  • 1
Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
  • It seems possible, that it behaves like `MergeStrategy.first` but different sources state different things. Fe. https://gist.github.com/simonwoo/04b133cb0745e1a0f1d6 says it would "it will cause Java class name conflict issue". If you could find some way to confirm the way it works I would be 110% satisfied. – Atais May 22 '18 at 10:59
  • Well, the documentation is more likely to be right than a random gist. But they don't actually disagree: the issue it causes can be exactly "you silently get the wrong result". – Alexey Romanov May 22 '18 at 11:11
  • You can verify it _for this case_ by unpacking the jar Maven produces and the dependency jars in SBT error message, then checking which `.class` file Maven used. For the general case, you have to either rely on documentation, or check the sources of `maven-assembly-plugin`. – Alexey Romanov May 22 '18 at 11:13
  • @Atais Actually, after rereading what it says more carefully, and looking at the definition of `jar-with-dependencies` in https://maven.apache.org/plugins/maven-assembly-plugin/descriptor-refs.html, it's allowed to pick arbitrary `.class` file, not necessarily the first: the first _phase_ wins, but in `jar-with-dependencies` there is only one phase including all dependencies. So you do need to look at the sources. – Alexey Romanov May 23 '18 at 06:55
  • @Atais But the point that it just picks a single class and if it picked the wrong one you won't know until running the program (if you are lucky) remains the same. Except it's less predictable which one it picks. – Alexey Romanov May 23 '18 at 06:55
0

From the build.sbt I can see that their is no Merge-Strategy in you build. Plus there is a Rogue "," in your libraryDependencies Key placed after the dependency of "org.apache.cassandra" % "cassandra-all" % "3.4" in your build.sbt in the project to which the link you have shared above.

A merge strategy is required to handle all the duplicate files and in the jar as well as versions. The following one is an example of how to get one in place in your build.

assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")       => MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")   => MergeStrategy.discard
  case "reference.conf"                                 => MergeStrategy.concat
  case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
  case _                                                => MergeStrategy.first
}

You could try writing a simple build file if you do not have sub-projects in your project. You can try the following build.sbt.

name := "assembly-test",

version := "0.1",

scalaVersion := "2.12.4",

libraryDependencies ++= Seq(
      "com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
      "org.apache.cassandra" % "cassandra-all" % "3.4"
)

mainClass in assembly := Some("com.atais.cassandra.MainClass")

assemblyMergeStrategy in assembly := {
      case m if m.toLowerCase.endsWith("manifest.mf")       => MergeStrategy.discard
      case m if m.toLowerCase.matches("meta-inf.*\\.sf$")   => MergeStrategy.discard
      case "reference.conf"                                 => MergeStrategy.concat
      case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
      case _                                                => MergeStrategy.first
    }
Yayati Sule
  • 1,601
  • 13
  • 25
  • I know about `assemblyMergeStrategy` and the extra commas do not matter. You did not answer the question. I know how to "make it work". I want to understand why it does not. Also, your merging strategies look really random, which is not a good idea, imo. – Atais May 22 '18 at 08:56