0

I have an RDD[Sale] and wanted to leave only the latest sales. So what I did is created a pair RDD and then performed grouping and filtering:

val sales: RDD[(String, Sale)] = rawSales.map(sale => sale.id -> sale)
      .groupByKey()
      .mapValues(_.maxBy(_.timestamp))

But how do I return back to RDD[Sale] instead of the pair RDD in this case?

The only way I figured out is the following:

val value: RDD[Sale] = sales.map(salePaired => salePaired._2)

Is it the most proper solution?

samba
  • 2,821
  • 6
  • 30
  • 85

1 Answers1

1

You can access the keys or values from pair RDD directly, like you access any Map

val keys: RDD[String] = sales.keys
val values: RDD[Sale] = sales.values