I have a Pair RDD (K, V)
with the key containing a time
and an ID
. I would like to get a Pair RDD of the form (K, Iterable<V>)
where the keys are groupped by id and the iterable is ordered by time.
I'm currently using sortByKey().groupByKey()
and my tests seem to prove it works, however I'm reading that it may not always be the case, as discussed in this question with diverging answers ( Does groupByKey in Spark preserve the original order? ).
Is it correct or not?
Thanks!