4

I'm struggling to submit a JAR to Apache Spark using spark-submit.

To make things easier, I've experimented using this blog post. The code is

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object SimpleScalaSpark { 
  def main(args: Array[String]) {
    val logFile = "/Users/toddmcgrath/Development/spark-1.6.1-bin-hadoop2.4/README.md" // I've replaced this with the path to an existing file
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

I'm running building this with Intellij Idea 2017.1 and running on Spark 2.1.0. Everything is running fine when I run it in the IDE.

I then build it as a JAR and attempt to use spark-submit as follows

./spark-submit --class SimpleScalaSpark --master local[*] ~/Documents/Spark/Scala/supersimple/out/artifacts/supersimple_jar/supersimple.jar

This results in the following error

java.lang.ClassNotFoundException: SimpleScalaSpark
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I'm at a loss as to what I'm missing...especially given that it runs as expected in the IDE.

dommer
  • 19,610
  • 14
  • 75
  • 137

4 Answers4

1

As per your description above ,You are not giving the correct class name, so it is not able to find that class.

Just replace SimpleSparkScala with SimpleScalaSpark

Try running this command:

./spark-submit --class SimpleScalaSpark --master local[*] ~/Documents/Spark/Scala/supersimple/out/artifacts/supersimple_jar/supersimple.jar

  • 1
    Yes.Sorry. I spotted that too and updated the question, but it's not actually the issue. I now think it's to do with the way Intellij is building the JAR. If I use the "From modules with dependencies..." option (e.g. http://stackoverflow.com/questions/1082580/how-to-build-jars-from-intellij-properly) it fails, as above . However, if I set the JAR settings up manually, it works. I'm not sure why the "quick setup" doesn't work. – dommer Apr 01 '17 at 21:24
  • if you run spark on local installation you should add % 'provided" to spark dependencies and then sbt clean and sbt assembly. – fpopic Jun 03 '17 at 10:44
1

Looks like there is an issue with your jar. You can check what classes are present in your jar by using the command: vi supersimple.jar

If SimpleScalaSpark class does not appear in the output of the previous command, that means your jar is not built properly.

shants
  • 612
  • 1
  • 7
  • 12
0

IDEs work differently from shell in many ways. I believe for shell you need to add --jars parameter

spark submit add multiple jars in classpath

Community
  • 1
  • 1
0

I am observing ClassNotFound on new classes I introduce. I am using a fat jar. I verified that the JAR file contains the new class file in all the copies in each node. (I am using the regular filesystem to load the Spark application, not hdfs nor an http URL). The JAR file loaded by the worker did not have the new class I introduced. It is an older version. The only way I found to get around the problem is to use a different filename for the JAR every time that I call spark-submit script.