I have a fat jar, written in Scala, packaged by sbt. I need to use it in a Spark cluster in AWS EMR.
It functions fine if I manually spin up the cluster, copy the jar to the master and run a spark-submit job using a command like this...
spark-submit --class org.company.platform.package.SparkSubmit --name platform ./platform-assembly-0.1.0.jar arg0 arg1 arg2
But... if I try to add it as a step to the EMR cluster, it fails. The log to stderr looks like this...
Exception in thread "main" java.lang.ClassNotFoundException: package.SparkSubmit
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
The relevant settings in my build.sbt look like this...
lazy val root = (project in file(".")).
settings(
name := "platform",
version := "0.1.0",
scalaVersion := "2.10.5",
organization := "org.company",
mainClass in Compile := Some("package/SparkSubmit")
)
The corresponding file with my MainClass looks like...
package org.company.platform.package
object SparkSubmit {
def main(args: Array[String]): Unit = {
// do stuff
}
}
In EMR Console... in the "Add Step" dialogue... next to the "Arguments" box, it says...
"These are passed to the main function in the JAR. If the JAR does not specify a main class in its manifest file you can specify another class name as the first argument."
I'd think because I DO specify a main class in the build.sbt, I'd be fine... but it fails and doesn't log anything about the failure. If I try to specify the main class as the first arg, it logs the failure I posted above.
I think it's probably a formatting problem, but I can't sort out how to fix it, and no examples turn up. I've tried submitting the following as args in the "Add Step" dialog...
arg0 arg1 arg2
package.SparkSubmit arg0 arg1 arg2
package/SparkSubmit arg0 arg1 arg2
org.company.platform.package.SparkSubmit arg0 arg1 arg2
A few others too, but nothing works.
Version info... EMR 4.3 Spark 1.6 Scala 2.10 sbt 0.13.9
Any ideas what dumb mistake I'm making that's not letting EMR/Spark find my main class?
Thanks.