I'm trying to make a first attempt to access Glue Catalog from scala code.
I already had some troubles while trying Maven to be able to build my project (This helped a lot How to set up a local development environment for Scala Spark ETL to run in AWS Glue?)
But now I'm trying to run my code in an EMR cluster and I'm getting this java.lang.NoClassDefFoundError
This is my code:
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.{DynamicFrame, DynamicRecord, GlueContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.slf4j.LoggerFactory
import org.apache.spark.sql.functions.{col, month, year}
object JoinAndRelation {
private val logger = LoggerFactory.getLogger(getClass)
def main(sysArgs: Array[String]): Unit = {
//Spark session creation with connection to Glue Catalog
implicit val spark: SparkSession = SparkSession
.builder
.config(new SparkConf().setAppName("TestGlueAccess"))
.getOrCreate()
val sc: SparkContext = spark.sparkContext
val glueContext: GlueContext = new GlueContext(sc)
...
And this is the error:
19/02/08 15:35:26 INFO Client:
client token: N/A
diagnostics: User class threw exception: java.lang.NoClassDefFoundError: com/amazonaws/services/glue/GlueContext
at org.sergio.poc.JoinAndRelation$.main(JoinAndRelation.scala:41)
at org.sergio.poc.JoinAndRelation.main(JoinAndRelation.scala)
I was able to compile it with Maven adding the glue-assembly.jar as a dependency, also tried to add aws-java-sdk-core aswell but it didn't work...
<dependency> <groupId>com.amazonaws</groupId> <artifactId>glue-assembly</artifactId> <version>1.0</version> <scope>system</scope> <systemPath>${project.basedir}/libs/glue-assembly.jar</systemPath> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-core</artifactId> <version>1.11.445</version> </dependency>
Finally this is the command I use to run it:
spark-submit --class org.sergio.poc.JoinAndRelation --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 2 MyFirstScalaMavenProject-1.0-SNAPSHOT.jar
Did anyone face the same issue?