0

I have a simple RDD created from a seq as follows.

val testRDD = sc.parallelize(list.toSeq,size);

While iterating through the rows, it throws a null exception error in cluster mode. It works fine in client mode.

testRDD.foreach(row => {
         logger.info("Row index "  + row.index.toString() );
       
    })

testRDD.count() and testRDD.partitions.size returns the appropriate results. When collect() action is performed foreach works fine, however I do not want collect in this scenario since the RDD needs to be distributed among nodes.

Ken White
  • 123,280
  • 14
  • 225
  • 444
Faizal
  • 353
  • 3
  • 16
  • Are you sure it was not related to [Spark not supporting nesting of RDDs](https://stackoverflow.com/questions/23793117/nullpointerexception-in-scala-spark-appears-to-be-caused-be-collection-type)? – Frank Nov 11 '22 at 21:24

1 Answers1

0

Able to identify the issue is related to logger object not RDD. Able to proceed after removing the logger and the broadcast other required variables.

Faizal
  • 353
  • 3
  • 16