Class org.apache-spark.SparkException, java.lang.NoClassDefFoundError: Could not initialize class XXX(class where field validation exists) Exception when I am trying to do field validations on Spark Dataframe. Here is my code
And all classes and object used are serialized. Fails on AWS EMR spark job (works fine in local Machine.)
val newSchema = df.schema.add("errorList", ArrayType(new StructType()
.add("fieldName" , StringType)
.add("value" , StringType)
.add("message" , StringType)))
//Validators is a Sequence of validations on columns in a Row.
// Validator method signature
// def checkForErrors(row: Row): (fieldName, value, message) ={
// logic to validate the field in a row }
val validateRow: Row => Row = (row: Row)=>{
val errorList = validators.map(validator => validator.checkForErrors(row)
Row.merge(row, Row(errorList))
}
val validateDf = df.map(validateRow)(RowEncoder.apply(newSchema))
Versions : Spark 2.4.7 and Scala 2.11.8
Any ideas on why this might happen or if someone had the same issue.