0

I am new the scala and SBT build files. From the introductory tutorials adding spark dependencies to a scala project should be straight-forward via the sbt-spark-package plugin but I am getting the following error:

[error] (run-main-0) java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

Please provide resources to learn more about what could be driving error as I want to understand process more thoroughly.

CODE:

trait SparkSessionWrapper {

  lazy val spark: SparkSession = {
    SparkSession
      .builder()
      .master("local")
      .appName("spark citation graph")
      .getOrCreate()
  }

  val sc = spark.sparkContext

}


import org.apache.spark.graphx.GraphLoader

object Test extends SparkSessionWrapper {

  def main(args: Array[String]) {
    println("Testing, testing, testing, testing...")

    var filePath = "Desktop/citations.txt"
    val citeGraph = GraphLoader.edgeListFile(sc, filepath)
    println(citeGraph.vertices.take(1))
  }
}

plugins.sbt

resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")

build.sbt -- WORKING. Why does libraryDependencies run/work ?

spName := "yewno/citation_graph"

version := "0.1"

scalaVersion := "2.11.12"

sparkVersion := "2.2.0"

sparkComponents ++= Seq("core", "sql", "graphx")

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.2.0",
  "org.apache.spark" %% "spark-sql" % "2.2.0",
  "org.apache.spark" %% "spark-graphx" % "2.2.0"
)

build.sbt -- NOT WORKING. Would expect this to compile & run correctly

spName := "yewno/citation_graph"

version := "0.1"

scalaVersion := "2.11.12"

sparkVersion := "2.2.0"

sparkComponents ++= Seq("core", "sql", "graphx")

Bonus for explanation + links to resources to learn more about SBT build process, jar files, and anything else that can help me get up to speed!

colbythenoob
  • 95
  • 2
  • 9

1 Answers1

1

sbt-spark-package plugin provides dependencies in provided scope:

sparkComponentSet.map { component =>
  "org.apache.spark" %% s"spark-$component" % sparkVersion.value % "provided"
}.toSeq

We can confirm this by running show libraryDependencies from sbt:

[info] * org.scala-lang:scala-library:2.11.12
[info] * org.apache.spark:spark-core:2.2.0:provided
[info] * org.apache.spark:spark-sql:2.2.0:provided
[info] * org.apache.spark:spark-graphx:2.2.0:provided

provided scope means:

The dependency will be part of compilation and test, but excluded from the runtime.

Thus sbt run throws java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

If we really want to include provided dependencies on run classpath then @douglaz suggests:

run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated
Mario Galic
  • 47,285
  • 6
  • 56
  • 98
  • Very clear answer, thank you. So for someone new to developing Scala projects what would be your suggestion for handling the tasks of running code (basic module tests) vs shipping final projects? Like is it "good practice" to leave douglaz's one-liner in the build file? – colbythenoob Feb 21 '19 at 14:27