0

I have an sbt project that I am trying to build into a jar with the sbt-assembly plugin.

build.sbt:

      name := "project-name"

      version := "0.1"

      scalaVersion := "2.11.12"

      val sparkVersion = "2.4.0"

      libraryDependencies ++= Seq(
        "org.scalatest" %% "scalatest" % "3.0.5" % "test",
        "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
        "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
        "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
        "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test",
        // spark-hive dependencies for DataFrameSuiteBase. https://github.com/holdenk/spark-testing-base/issues/143
        "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
        "com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
        "com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
        "com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",
        //"org.apache.hadoop" % "hadoop-aws" % "3.1.1"
        "org.json" % "json" % "20180813"
      )

      assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
      assemblyMergeStrategy in assembly := {
       case PathList("META-INF", xs @ _*) => MergeStrategy.discard
       case x => MergeStrategy.first
      }
      test in assembly := {}

      // https://github.com/holdenk/spark-testing-base
      fork in Test := true
      javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
      parallelExecution in Test := false

When I build the project with sbt assembly, the resulting jar contains /org/junit/... and /org/opentest4j/... files

Is there any way to not include these test related files in the final jar?

I have tried replacing the line:

    "org.scalatest" %% "scalatest" % "3.0.5" % "test"

with:

    "org.scalatest" %% "scalatest" % "3.0.5" % "provided"

I am also wondering how the files are included in the jar as junit is not referenced inside build.sbt (there are junit tests in the project however)?

Updated:

    name := "project-name"

    version := "0.1"

    scalaVersion := "2.11.12"

    val sparkVersion = "2.4.0"

    val excludeJUnitBinding = ExclusionRule(organization = "junit")

    libraryDependencies ++= Seq(
      // Provided
      "org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
      "org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
      "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
      "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
      "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
      "com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
      "com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
      "com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",

      // Test
      "org.scalatest" %% "scalatest" % "3.0.5" % "test",

      // Necessary
      "org.json" % "json" % "20180813"
    )

    excludeDependencies += excludeJUnitBinding

    // https://stackoverflow.com/questions/25144484/sbt-assembly-deduplication-found-error
    assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    assemblyMergeStrategy in assembly := {
     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
     case x => MergeStrategy.first
    }


    // https://github.com/holdenk/spark-testing-base
    fork in Test := true
    javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
    parallelExecution in Test := false
  • By default, sbt-assembly does not include test jars. I had this problem when a dependency I included itself (incorrectly) listed a test framework as a runtime dependency. Do you know which package pulls in junit? – adhominem Apr 02 '19 at 08:22
  • Im not sure, if I append each dependency with "required" the test files are still included. Would this mean its not any of the included dependencies pulling them in at runtime? – Alex Shapovalov Apr 02 '19 at 08:30

1 Answers1

0

To exclude certain transitive dependencies of a dependency, use the excludeAll or exclude methods.

The exclude method should be used when a pom will be published for the project. It requires the organization and module name to exclude.

For example:

libraryDependencies += 
  "log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms")

The excludeAll method is more flexible, but because it cannot be represented in a pom.xml, it should only be used when a pom doesn’t need to be generated.

For example,

libraryDependencies +=
  "log4j" % "log4j" % "1.2.15" excludeAll(
    ExclusionRule(organization = "com.sun.jdmk"),
    ExclusionRule(organization = "com.sun.jmx"),
    ExclusionRule(organization = "javax.jms")
  )

In certain cases a transitive dependency should be excluded from all dependencies. This can be achieved by setting up ExclusionRules in excludeDependencies(For sbt 0.13.8 and above).

excludeDependencies ++= Seq(
  ExclusionRule("commons-logging", "commons-logging")
)

JUnit jar file downloads as part of below dependencies.

"org.apache.spark" %% "spark-core" % sparkVersion % "provided" //(junit)
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"// (junit)
"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" //(org.junit)

To exclude junit file please update your dependency as below.

val excludeJUnitBinding = ExclusionRule(organization = "junit")

  "org.scalatest" %% "scalatest" % "3.0.5" % "test",
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test" excludeAll(excludeJUnitBinding)

Update: Please update your build.abt as below.

resolvers += Resolver.url("bintray-sbt-plugins",
  url("https://dl.bintray.com/eed3si9n/sbt-plugins/"))(Resolver.ivyStylePatterns)

val excludeJUnitBinding = ExclusionRule(organization = "junit")

libraryDependencies ++= Seq(
  // Provided
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "provided" excludeAll(excludeJUnitBinding),
  "org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
  //"com.amazonaws" % "aws-java-sdk" % "1.11.513" % "provided",
  //"com.amazonaws" % "aws-java-sdk-sqs" % "1.11.513" % "provided",
  //"com.amazonaws" % "aws-java-sdk-s3" % "1.11.513" % "provided",

  // Test
  "org.scalatest" %% "scalatest" % "3.0.5" % "test",

  // Necessary
  "org.json" % "json" % "20180813"
)

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", "-XX:+CMSClassUnloadingEnabled")
parallelExecution in Test := false

plugin.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

I have tried and it's not downloading junit jar file.enter image description here

KZapagol
  • 888
  • 6
  • 9
  • I have tried changing the only non provided or test import to org.json" % "json" % "20180813" exclude("org.junit", "junit"), still the org/junit... files are present in the jar. Am I missing something? – Alex Shapovalov Apr 02 '19 at 10:10
  • `org.junit` won't get download as part of `org.json` dependency. Do you know the package name of junit module? – KZapagol Apr 02 '19 at 10:33
  • My understanding is that because org.json is the only library not marked as "provided" or "test", only it will have its dependencies pulled in. So excluding org.junit from this library should make sure the files are not imported into the jar. Does this make sense? – Alex Shapovalov Apr 02 '19 at 10:45
  • @AlexShapovalov `junit:junit:jar` file are downloading as part of `"org.apache.spark" %% "spark-sql" % sparkVersion % "provided"` and `"com.holdenkarau" %% "spark-testing-base" % "2.3.1_0.10.0" % "test"`. Please check my updated comment to exclude Junit jar file. – KZapagol Apr 02 '19 at 11:47
  • Please let me know still if you face same issue. – KZapagol Apr 02 '19 at 11:58
  • I have posted an updated build.sbt and am still facing the same issue. – Alex Shapovalov Apr 02 '19 at 12:28
  • @AlexShapovalov .. It looks your build.sbt have some problem. I have updated build.sbt and now it works fine. Please check updated comment. – KZapagol Apr 03 '19 at 07:07