I have an RDD[Sale]
and wanted to leave only the latest sales. So what I did is created a pair RDD and then performed grouping and filtering:
val sales: RDD[(String, Sale)] = rawSales.map(sale => sale.id -> sale)
.groupByKey()
.mapValues(_.maxBy(_.timestamp))
But how do I return back to RDD[Sale]
instead of the pair RDD in this case?
The only way I figured out is the following:
val value: RDD[Sale] = sales.map(salePaired => salePaired._2)
Is it the most proper solution?