If I have an RDD with volumes per minute, e.g.
(("12:00" -> 124), ("12:01" -> 543), ("12:02" -> 102), ... )
And I want to go about mapping that to a dataset with volume in this minute, volume of previous minute, average volume of previous 5 minutes. E.g.
(("12:00" -> (124, 300, 245.3)),
("12:01" -> (543, 124, 230.2)),
("12:02" -> (102, 543, 287.1)))
The input RDD could be a RDD[(DateTime, Int)]
and the output RDD[(DateTime, (Int, Int, Float))]
.
What are good ways to go about doing that?