0

I have a scala jar file called SGA.jar. Within, there is a class called org/SGA/MainTest, which uses the underlying SGA.jar logic to perform some graph operations and looks like:

package org.SGA

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

import java.io._
import scala.util._

object MainTest { 

  def initialize() : Unit = {
    println("Initializing")
  }

  def perform(collection : Iterable[String]) : Unit = {
    val conf = new SparkConf().setAppName("maintest")
    val sparkContext = new SparkContext(conf)
    sparkContext.setLogLevel("ERROR")

    val edges = sparkContext.parallelize(collection.toList).map(_.split(" ")).map { edgeCoordinates => new Edge(edgeCoordinates(0).toLong, edgeCoordinates(1).toLong, edgeCoordinates(2).toDouble) }

    println("Creating graph")
    val graph : Graph[Any, Double] = Graph.fromEdges(edges, 0)
    println("Graph created")

    // ...
  }
}

SGA.jar is embedded into scalaWrapper.jar, which is a java wrapper around the scala SGA.jar and the necessary datasets. It's folder structure looks like this:

scalaWrapper.jar
| META-INF
| | MANIFEST.MF
| scalawrapper
| | datasets
| | | data1.txt
| | jars
| | | SGA.jar
| | FileParser.java
| | FileParser.class
| | WrapperClass.java
| | WrapperClass.class
| .classpath
| .project

The FileParser class, basically converts the data available in the text files into usable structures and is of no further interest here. The main class is WrapperClass, however:

package scalawrapper;

import scala.collection.*;
import scala.collection.Iterable;
import java.util.List;
import org.SGA.*;

public class WrapperClass {
    public static void main(String[] args) {
        FileParser fileparser = new FileParser();

        String filepath = "/scalawrapper/datasets/data1.txt";

        MainTest.initialize();

        List<String> list = fileparser.Parse(filepath);
        Iterable<String> scalaIterable = JavaConversions.collectionAsScalaIterable(list);       
        MainTest.perform(scalaIterable);
    }
}

SGA.jar is built via SBT and the java jar is developed and exported from Eclipse. When running locally (in which case the SparkConf has appended .setMaster("local[*]").set("spark.executor.memory","7g") to facilitate a local execution), there are no issues and the code behaves as expected.

The problem arises when the scalaWrapper.jar is expected to run on an EMR cluster. The cluster is defined as a 1 master + 4 worker nodes, with an additional spark application step:

Main class : None
Arguments : spark-submit --deploy-mode cluster --class scalawrapper.WrapperClass --executor-memory 17g --executor-cores 16 --driver-memory 17g s3://scalaWrapperCluster/scalaWrapper.jar

The execution fails with :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt1/yarn/usercache/hadoop/filecache/10/__spark_libs__1619195545177535823.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/04/22 16:56:43 INFO SignalUtils: Registered signal handler for TERM
19/04/22 16:56:43 INFO SignalUtils: Registered signal handler for HUP
19/04/22 16:56:43 INFO SignalUtils: Registered signal handler for INT
19/04/22 16:56:43 INFO SecurityManager: Changing view acls to: yarn,hadoop
19/04/22 16:56:43 INFO SecurityManager: Changing modify acls to: yarn,hadoop
19/04/22 16:56:43 INFO SecurityManager: Changing view acls groups to: 
19/04/22 16:56:43 INFO SecurityManager: Changing modify acls groups to: 
19/04/22 16:56:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users  with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
19/04/22 16:56:44 INFO ApplicationMaster: Preparing Local resources
19/04/22 16:56:44 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1555952041027_0001_000001
19/04/22 16:56:44 INFO ApplicationMaster: Starting the user application in a separate Thread
19/04/22 16:56:44 INFO ApplicationMaster: Waiting for spark context initialization...
19/04/22 16:56:44 ERROR ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: org/SGA/MainTest
java.lang.NoClassDefFoundError: org/SGA/MainTest
    at scalawrapper.WrapperClass.main(WrapperClass.java:20)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:678)
Caused by: java.lang.ClassNotFoundException: org.SGA.MainTest
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 6 more

Note that WrapperClass.java:20 corresponds to MainTest.initialize();.

This exception seems to be quite popular, as I came upon quite a few attempts to solve (example), yet none solved my problem. I tried including in the scalaWrapper.jar file also the scala-library that was used for building SGA.jar, eliminating static fields, searching for mistakes in the project definitions, but had no luck.

user3209815
  • 357
  • 1
  • 11
  • 25

1 Answers1

0

I resolved the issue by uploading SGA.jar to S3 separately and by adding it as the --jars parameter to spark-submit.

spark-submit --deploy-mode cluster  --jars s3://scalaWrapperCluster/SGA.jar  --class scalawrapper.WrapperClass --executor-memory 17g --executor-cores 16 --driver-memory 17g s3://scalaWrapperCluster/scalaWrapper.jar

Note that the original functionality within scalaWrapper.jar (including the already built-in SGA.jar) didn't change. And the separately uploaded SGA.jar is the one being executed.

user3209815
  • 357
  • 1
  • 11
  • 25