df = pd.DataFrame([[1,
pd.Timestamp('2020-12-05 19:04:51.878000'),
pd.Timestamp('2020-12-05 19:04:51.946000'),
pd.Timestamp('2021-04-08 00:34:51.946000')],
[1,
pd.Timestamp('2021-04-08 17:22:20.476000'),
pd.Timestamp('2021-04-09 05:30:00'),
pd.Timestamp('2021-10-08 05:30:00')],
[1,
pd.Timestamp('2021-04-10 17:23:59'),
pd.Timestamp('2021-10-09 05:30:00'),
pd.Timestamp('2022-04-09 05:30:00')],
[2,
pd.Timestamp('2020-12-22 18:55:42.736000'),
pd.Timestamp('2020-12-22 18:55:43'),
pd.Timestamp('2022-02-16 00:25:42.746000')],
[2,
pd.Timestamp('2021-04-05 14:12:00.130000'),
pd.Timestamp('2022-02-16 05:30:00'),
pd.Timestamp('2023-02-16 05:30:00')],
[2,
pd.Timestamp('2021-04-27 23:41:53.381000'),
pd.Timestamp('2023-02-17 05:30:00'),
pd.Timestamp('2023-08-17 05:30:00')],
[2,
pd.Timestamp('2023-10-29 23:41:53'),
pd.Timestamp('2023-10-29 05:30:00'),
pd.Timestamp('2024-08-17 05:30:00')]], columns= ['user_id', 'subscription_date_time', 'start_date', 'end_date'])
I have this problem at hand to identify the continous plan times. So a user can subscribe to a plan anytime for x number of days. Each time he subscribes new record is created with following cases
- start date of new record = end_date + 1 of old record (If he subscribes within the end_date of old plan. So basically it is just an extension of old plan)
- start date of new record = current_date (If there is no active plan)
Now I need to combine all the plans of case 1 and get single start time and end time. How can i achieve this ? I am using python. Any logic would help me.
Things I tried:
- In the current image there are three continous plans for a user. Say if it is two i was able to achieve it by the time difference of new_start_date - old_end_date == 1. But if there are more than two continous subscriptions, I am stuck on how to combine first and third. Another challenge is if he has a case 2 also then it should be considered completely new.
Expeted Outcome:
df = pd.DataFrame([[1,
pd.Timestamp('2020-12-05 19:04:51.878000'),
pd.Timestamp('2020-12-05 19:04:51.946000'),
pd.Timestamp('2022-04-09 05:30:00')],
[2,
pd.Timestamp('2020-12-22 18:55:42.736000'),
pd.Timestamp('2020-12-22 18:55:43'),
pd.Timestamp('2023-08-17 05:30:00')],
[2,
pd.Timestamp('2023-10-29 23:41:53'),
pd.Timestamp('2023-10-29 05:30:00'),
pd.Timestamp('2024-08-17 05:30:00')]], columns= ['user_id', 'subscription_date_time', 'start_date', 'end_date'])
Note : You can add random unique ids to each row in the original dataframe if required.