I have the following RDD
val reducedListOfCalls: RDD[(String, List[Row])]
The RDDs are:
[(923066800846, List[2016072211,1,923066800846])]
[(923027659472, List[2016072211,1,92328880275]),
923027659472, List[2016072211,1,92324440275])]
[(923027659475, List[2016072211,1,92328880275]),
(923027659475, List[2016072211,1,92324430275]),
(923027659475, List[2016072211,1,92334340275])]
As shown above first RDD has 1 (key,value) pair, second has 2, and third has 3 pairs.
I want to remove all RDDs that has less than 2 key-value pairs. The result RDD expected is:
[(923027659472, List[2016072211,1,92328880275]),
923027659472, List[2016072211,1,92324440275])]
[(923027659475, List[2016072211,1,92328880275]),
(923027659475, List[2016072211,1,92324430275]),
(923027659475, List[2016072211,1,92334340275])]
I have tried the following:
val reducedListOfCalls = listOfMappedCalls.filter(f => f._1.size >1)
but it still given the original list only. The filter seems to have not made any difference.
Is it possible to count the number of keys in a mapped RDD, and then filter based on the count of keys?