I have a sequence data, transforming into RDD.
filteredRDD.collect()
[0, 1, 2, 3, 5, 9]
I like to get the delta the current one - the previous value, the output is [1, 1, 1, 2, 4].
What kind of window function do we have spark 1.6?
I have a sequence data, transforming into RDD.
filteredRDD.collect()
[0, 1, 2, 3, 5, 9]
I like to get the delta the current one - the previous value, the output is [1, 1, 1, 2, 4].
What kind of window function do we have spark 1.6?
What you can do to get your desired result is zipWithIndex
You can zipWithIndex
your rdd (call it rdd1[Long, Int])
then
val rdd2 = rdd1.map{case(index, value) => (index + 1, value)}
Now if you val rdd3 = rdd1.join(rdd2).mapValues(case (a, b) => a -b ).values
that is your row wise delta
. This is very efficient as it does not kick in a lot of shuffling.
Thanks Manas