24

I copied https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/RandomForestClassifierExample.scala into a new project and setup a build.sbt

name := "newproject"
version := "1.0"
scalaVersion := "2.11.8"

javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
scalacOptions += "-deprecation"

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.11"  % "2.0.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11"   % "2.0.0" % "provided",
  "org.apache.spark" % "spark-mllib_2.11" % "2.0.0" % "provided",
  "org.jpmml" % "jpmml-sparkml" % "1.1.1",
  "org.apache.maven.plugins" % "maven-shade-plugin" % "2.4.3",
  "org.scalatest" %% "scalatest" % "3.0.0"
)

I am able to build it from IntelliJ 2016.2.5, but I when I get the error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
    at org.apache.spark.examples.ml.RandomForestClassifierExample$.main(RandomForestClassifierExample.scala:32)
    at org.apache.spark.examples.ml.RandomForestClassifierExample.main(RandomForestClassifierExample.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more

I am even able to click on SparkSession and get to the source code. What is the problem?

Make42
  • 12,236
  • 24
  • 79
  • 155
  • 1
    are you using `spark-submit` to run your app? – maasg Nov 02 '16 at 15:53
  • I'm afraid you'll have to use english :-) – maasg Nov 02 '16 at 16:58
  • @maasg: Dang - I am so tired. Today was a real tough working day - again. I said: No I haven't. I used IntelliJ. I think that stands in conflict with the "provided" of SBT, right? – Make42 Nov 02 '16 at 17:02
  • 1
    First time build your app without `provided`. After then revert your change and then rebuild it. I could see the same issue in my IDEA too. – mrsrinivas Nov 02 '16 at 17:07
  • See https://stackoverflow.com/questions/36437814/how-to-work-efficiently-with-sbt-spark-and-provided-dependencies – James Moore Jun 21 '20 at 02:39

5 Answers5

39

When you say provided for your dependency, the build will compile against that dependency, but it will not be added to the classpath at runtime (it is assumed to be already there).

That is the correct setting when building Spark jobs for spark-submit (because they will run inside of a Spark container that does provide the dependency, and including it a second time would cause trouble).

However, when you run locally, you need that dependency present. So either change the build to not have this provided (but then you need to adjust it when building to submit the job), or configure your runtime classpath in the IDE to already have that jar file.

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • 2
    Thank you. How do I "congiure [my] runtime classpath in the [IntelliJ] to already have that jar file" for Spark? I found http://stackoverflow.com/a/24843914/4533188 - do I add the folder "spark-2.0.1-bin-hadoop2.7/jars"? That is what I tried, but it does not work: I get a `NoClassDefFoundError` once again. – Make42 Nov 03 '16 at 09:31
  • 15
    In IntelliJ you can go to the Run Configuration and simply click `Include dependencies with "Provided" scope` – Harald Gliebe Aug 31 '18 at 05:59
7

In my case, I was using my local Cloudera CDH 5.9.0 cluster with Spark 1.6.1 installed by default and Spark 2.0.0 installed as a parcel. Thus, spark-submit was using Spark 1.6.1 while spark2-submit was Spark 2.0.0. Since SparkSession did not exist in 1.6.1, the error was thrown. Using the correct spark2-submit command resolved the problem.

Garren S
  • 5,552
  • 3
  • 30
  • 45
3

I got the same issue and it got fixed after setting SPARK_HOME variable before submitting the spark job using spark-submit.

Ravi
  • 31
  • 1
2

Ok, I landed here following a link on sbt gitter channel searching for something else. I have a solution for this. Thilo has described the problem correctly. Your sbt says "provided" which is correct for your target environment when you run it on your cluster where spark libraries are provided but when you run locally within IntelliJ, you'll need to "provide" these external libraries to IntelliJ at runtime and a way to do that would be

  1. Right click on your project ->
  2. Open Module settings ->
  3. Select Libraries on LHS menu ->
  4. Click + sign ->
  5. choose 'From Maven' ->
  6. Type or search for maven coordinates. You can search by typing the lib name and hit the tab key. This will show a dropdown of all matches and you can choose the correct version for your library ->
  7. Click OK

Note that when you restarted IntelliJ you might have to repeat this process. I found this to be the case for IntelliJ IDEA 2016.3.6 on OS X El Capitan.

sparker
  • 1,245
  • 11
  • 17
2

If using Maven, go to your dependencies file(pom.xml) and change the scope from provided to compile.

vagdevi k
  • 1,478
  • 9
  • 25