0

I have a sequence data, transforming into RDD.

filteredRDD.collect()

[0, 1, 2, 3, 5, 9]

I like to get the delta the current one - the previous value, the output is [1, 1, 1, 2, 4].

What kind of window function do we have spark 1.6?

wenfeng
  • 177
  • 1
  • 7

1 Answers1

2

What you can do to get your desired result is zipWithIndex

You can zipWithIndex your rdd (call it rdd1[Long, Int]) then

val rdd2 = rdd1.map{case(index, value) => (index + 1, value)} Now if you val rdd3 = rdd1.join(rdd2).mapValues(case (a, b) => a -b ).values

that is your row wise delta. This is very efficient as it does not kick in a lot of shuffling.

Thanks Manas

Manas
  • 519
  • 4
  • 14