Unable to access RDD in Cluster Mode

Question

I have a simple RDD created from a seq as follows.

val testRDD = sc.parallelize(list.toSeq,size);

While iterating through the rows, it throws a null exception error in cluster mode. It works fine in client mode.

testRDD.foreach(row => {
         logger.info("Row index "  + row.index.toString() );
       
    })

testRDD.count() and testRDD.partitions.size returns the appropriate results. When collect() action is performed foreach works fine, however I do not want collect in this scenario since the RDD needs to be distributed among nodes.

Are you sure it was not related to [Spark not supporting nesting of RDDs](https://stackoverflow.com/questions/23793117/nullpointerexception-in-scala-spark-appears-to-be-caused-be-collection-type)? — Frank, Nov 11 '22 at 21:24

score 0 · Answer 1 · answered Oct 14 '22 at 12:47

0

Able to identify the issue is related to logger object not RDD. Able to proceed after removing the logger and the broadcast other required variables.

answered Oct 14 '22 at 12:47

Faizal

353
3
16

Unable to access RDD in Cluster Mode

1 Answers1