below is the logic to add sequence number column in dataframe. Its working as expected when I am reading data from delimited files. Today I have a new task to read the data from oracle table and add Sequence number and process further. I am facing issue with below logic to add sequence number in data frame when I read it from oracle table.
oracleTableDF is my dataframe
//creating Sequence no. logic for SeqNum
val rowRDD = oracleTableDF.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((((indexedRow._2.toLong+1)).toLong) +: indexedRow._1.toSeq))
//creating StructType to add Seqnum in schema
val newstructure = StructType(Array(StructField("SeqNum",LongType)).++(oracleTableDF.schema.fields))
//creating new Data Frame with seqnum
oracleTableDF = spark.createDataFrame(rowRDD, newstructure)
I am not able to locate the actual Issue. because the logic is working as expected in cluster when I read it from files. but facing some issue when I read it from oracle table. its working as expected in local mode also.
below is the error :
"ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, xxxx, executor 1): java.lang.NoClassDefFoundError: Could not initialize class oracleDataProcess$"