I have a pyspark dataframe df
---------------------------------------------------------
primaryKey | start_timestamp | end_timestamp
---------------------------------------------------------
key1 | 2020-08-13 15:40:00 | 2020-08-13 15:44:47
key2 | 2020-08-14 12:00:00 | 2020-08-14 12:01:13
I want to create a dataframe that will have a timeseries that is between start_timestamp and end_timestamp for all keys at a gap of x seconds. For example for a gap of x = 120 seconds the output will be as:-
-----------------------------------------------------------
primaryKey | start_timestamp_new | end_timestamp_new
key1 | 2020-08-13 15:40:00 | 2020-08-13 15:41:59
key1 | 2020-08-13 15:42:00 | 2020-08-13 15:43:59
key1 | 2020-08-13 15:44:00 | 2020-08-13 15:45:59
key2 | 2020-08-14 12:00:00 | 2020-08-14 12:01:59
I am trying to use the approach mentiond here, but unable to apply it to a spark dataframe.
Any info on creating this will be a huge help.