24

I'm having problems with a "ClassNotFound" Exception using this simple example:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

import scala.util.Marshal

class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}

object RoundTripTester {

  def test(id : Int) : ClassToRoundTrip = {

    // Get the current classpath and output. Can we see simpleapp jar?
    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Executor classpath is:" + url.getFile))

    // Simply instantiating an instance of object and using it works fine.
    val testObj = new ClassToRoundTrip(id)
    println("testObj.id: " + testObj.id)

    val testObjBytes = Marshal.dump(testObj)
    val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes)  // <<-- ClassNotFoundException here
    testObjRoundTrip
  }
}

object SimpleApp {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Driver classpath is: " + url.getFile))

    val data = Array(1, 2, 3, 4, 5)
    val distData = sc.parallelize(data)
    distData.foreach(x=> RoundTripTester.test(x))
  }
}

In local mode, submitting as per the docs generates a "ClassNotFound" exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay:

spark-submit --class "SimpleApp" \
             --master local[4] \
             target/scala-2.10/simpleapp_2.10-1.0.jar

However, if I add extra parameters for "driver-class-path", and "-jars", it works fine, on local.

spark-submit --class "SimpleApp" \
             --master local[4] \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

However, submitting to a local dev master, still generates the same issue:

spark-submit --class "SimpleApp" \
             --master spark://localhost.localdomain:7077 \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

I can see from the output that the JAR file is being fetched by the executor.

Logs for one of the executor's are here:

stdout: http://pastebin.com/raw.php?i=DQvvGhKm

stderr: http://pastebin.com/raw.php?i=MPZZVa0Q

I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR. I would rather not have to hardcode values in SPARK_CLASSPATH or SparkContext.addJar. Can anyone help?

TooTone
  • 7,129
  • 5
  • 34
  • 60
puppet
  • 660
  • 1
  • 5
  • 8
  • 2
    Update - I've been able to work around this by setting the "spark.executor.extraClassPath" and making the JAR file locally available on each of the executor's at the path. I don't understand why this is needed: The JAR is being fetched from Spark's internal HTTP server by the executor's and copied into the working directory of each executor. – puppet Sep 08 '14 at 14:37
  • I am seeing the same issue today. Jar is being fetched by executor and it has the class its looking for even though it throws ClassNotFoundException!! I am on 1.0.2 btw – nir Nov 06 '14 at 20:40
  • Update again - I think this might have something to do with serialization. We found a couple of days ago that changing the serialization method made the problem go away. I'm still not sure why, but it's worth a try. – puppet Nov 08 '14 at 19:46

5 Answers5

18

I had this same issue. If master is local then program runs fine for most people. If they set it to (also happened to me) "spark://myurl:7077" it doesn't work. Most people get error because an anonymous class was not found during execution. It is resolved by using SparkContext.addJars ("Path to jar").

Make sure you are doing the following things: -

  • SparkContext.addJars("Path to jar created from maven [hint: mvn package]").
  • I have used SparkConf.setMaster("spark://myurl:7077") in code and have supplied same as argument while submitting job to spark via command line.
  • When you specify class in command line, make sure your are writing it's complete name with URL. eg: "packageName.ClassName"
  • Final command should look like this bin/spark-submit --class "packageName.ClassName" --master spark://myurl:7077 pathToYourJar/target/yourJarFromMaven.jar

Note: this jar pathToYourJar/target/yourJarFromMaven.jar in last point is also set in code as in first point of this answer.

busybug91
  • 241
  • 5
  • 8
  • 1
    This link is invaluable - http://www.datastax.com/dev/blog/common-spark-troubleshooting ; Note that you need to include a 'fat' jar if you do not plan to copy the dependencies to all the nodes. Kindly check here to see how to build one using sbt:assembly http://stackoverflow.com/questions/28459333/how-to-build-an-uber-jar-fat-jar-using-sbt-within-intellij-idea – Alex Punnen Jun 13 '16 at 12:03
4

I also had same issue. I think --jars is not shipping the jars to executors. After I added this into SparkConf, it works fine.

 val conf = new SparkConf().setMaster("...").setJars(Seq("/a/b/x.jar", "/c/d/y.jar"))

This web page for trouble shooting is useful too.

Yifei
  • 1,944
  • 1
  • 14
  • 20
3

You should set the SPARK_CLASS_PATH in spark-env.sh file like this:

SPARK_LOCAL_IP=your local ip 
SPARK_CLASSPATH=your external jars

and you should submit with spark shell like this:spark-submit --class your.runclass --master spark://yourSparkMasterHostname:7077 /your.jar

and your java code like this:

SparkConf sparkconf = new SparkConf().setAppName("sparkOnHbase");  JavaSparkContext sc = new JavaSparkContext(sparkconf);

then it will work.

Tom
  • 43,810
  • 29
  • 138
  • 169
capotee
  • 31
  • 1
1

If you are using Maven and Maven Assembly plugin to build your jar file with mvn package, ensure that the assembly plugin is configured correctly to point to your Spark app's main class.

Something like this should be added to your pom.xml to avoid any java.lang.ClassNotFoundException's:

           <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.4.1</version>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>com.my.package.SparkDriverApp</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <skipAssembly>false</skipAssembly>
            </configuration>
            <executions>
                <execution>
                    <id>package</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
bp2010
  • 2,342
  • 17
  • 34
0

What I figured out was if you have build your project without any warnings then you don't have to write extra code for master and other things. Although it is a good practice but you can just avoid it. Like here in my case there was no warnings in the project so I was able to run it without any extra code. Project Structure Link

In the case where I have some build related warnings there I have to take care of JAR paths, my URL and the master in code as well as while executing it.

I hope it may help someone. Cheers !

RushHour
  • 494
  • 6
  • 25