I'm trying to extract combined data intervals based on a time series in scala and spark
I have the following data in a dataframe:
Id | State | StartTime | EndTime
---+-------+---------------------+--------------------
1 | R | 2019-01-01T03:00:00 | 2019-01-01T11:30:00
1 | R | 2019-01-01T11:30:00 | 2019-01-01T15:00:00
1 | R | 2019-01-01T15:00:00 | 2019-01-01T22:00:00
1 | W | 2019-01-01T22:00:00 | 2019-01-02T04:30:00
1 | W | 2019-01-02T04:30:00 | 2019-01-02T13:45:00
1 | R | 2019-01-02T13:45:00 | 2019-01-02T18:30:00
1 | R | 2019-01-02T18:30:00 | 2019-01-02T22:45:00
I need to extract the data into time intervals based on the id and state. The resulting data needs to look like:
Id | State | StartTime | EndTime
---+-------+---------------------+--------------------
1 | R | 2019-01-01T03:00:00 | 2019-01-01T22:00:00
1 | W | 2019-01-01T22:00:00 | 2019-01-02T13:45:00
1 | R | 2019-01-02T13:45:00 | 2019-01-02T22:45:00
Note that the first three records have been grouped together because the equipment is contiguously in an R state from 2019-01-01T03:00:00 to 2019-01-01T22:00:00, then it switches to a W state for the next two records from 2019-01-01T22:00:00 to 2019-01-02T13:45:00and then back to an R state for the last two records.