I'm trying to build a simple Scala-based Spark application and run it in EMR, but when I run it, I get Error: Failed to load class: com.myorganization.MyScalaObj
. My Scala file is:
package com.myorganization
import org.apache.spark.sql.SparkSession
object MyScalaObj extends App {
val spark = SparkSession.builder()
.master(("local[*]"))
.appName("myTestApp")
.getOrCreate()
val df = spark.read.csv("s3://my_bucket/foo.csv")
df.write.parquet("s3://my_bucket/foo.parquet")
}
To the stock build.sbt file, I added a few lines including the Scala version, Spark library dependencies, and mainClass
(which I found from this question.
name := "sbtproj"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.0",
"org.apache.spark" %% "spark-sql" % "3.0.0"
)
mainClass in (Compile, run) := Some("com.myorganization.MyScalaObj")
I build this and get a MyScalaObj.class
which I am manually packaging into a jar with jar cf MyScalaObj.jar MyScalaObj.class
. I copied this to my EMR cluster running Spark 3.0.0 and Scala 2.12.10.
I then tried to run my application with spark-submit --class com.myorganization.MyScalaObj MyScalaObj.jar --deploy-mode cluster --master spark://x.x.x.x
, but it fails with Error: Failed to load class com.myorganization.MyScalaObj.
As this whole process is quite new to me, I'm not sure whether the error is in my sbt config (I don't know sbt at all), with the Scala object itself, something missing (eg, a manifest?), or in how I'm invoking Spark. What's the likely cause of my error here?