1

I've got sequence of values (from a reduce by key). I know that in theory the keys are an ordered sequence of things, and I should be able to reduce over them.

I want to run a window over these sequences. I could store the window in the accumulator for the reduce function, but I think that the way Spark parallelizes the workflow (requiring functions to be commutative and associative) means that the windows may get chopped.

Is there a way to do this?

Joe
  • 46,419
  • 33
  • 155
  • 245
  • I think this is a duplicate of http://stackoverflow.com/questions/23402303/apache-spark-moving-average/23436517. – Daniel Darabos Dec 10 '14 at 09:12
  • It may be, although I'm looking for an operation on a key value set with reduceByKey. – Joe Dec 10 '14 at 12:56
  • Sorry, I missed that part. Then your data is not ordered in fact. In `spark-shell` check out `sc.parallelize(Seq(7 -> 1, 7 -> 1, 8 -> 1, 8 -> 1)).reduceByKey(_ + _).collect`. It returns `Array((8,2), (7,2))`. – Daniel Darabos Dec 10 '14 at 14:04
  • Well then I have no hope. I'm closing as a duplicate as you suggested, because there is always some flexibility. – Joe Dec 10 '14 at 14:39

0 Answers0