0

I want to remove last line from RDD using .mapPartitionsWithIndex function.

I have tried below code

val withoutFooter = rdd.mapPartitionsWithIndex { (idx, iter) =>     
     if (idx == noOfTotalPartitions) {
         iter.drop(size - 1)
     }
     else iter 
}

But not able to get correct result.

Praveen
  • 55
  • 7

1 Answers1

0

drop will drop first n elements and returns the remaining elements

Read more here https://stackoverflow.com/a/51792161/6556191

Below code works for me

val rdd = sc.parallelize(Array(1,2,3,4,5,6,7,8,9),4)

val lastPartitionIndex = rdd.getNumPartitions - 1

rdd.mapPartitionsWithIndex { (idx, iter) => 
    var reti = iter
    if (idx == lastPartitionIndex) {
        var lastPart = iter.toArray
        reti = lastPart.slice(0, lastPart.length-1).toIterator
    }
    reti
}
Dharaneesh Vrd
  • 190
  • 2
  • 11