Group by and Finding Continous subscription plans Python

Question

df = pd.DataFrame([[1,
  pd.Timestamp('2020-12-05 19:04:51.878000'),
  pd.Timestamp('2020-12-05 19:04:51.946000'),
  pd.Timestamp('2021-04-08 00:34:51.946000')],
 [1,
  pd.Timestamp('2021-04-08 17:22:20.476000'),
  pd.Timestamp('2021-04-09 05:30:00'),
  pd.Timestamp('2021-10-08 05:30:00')],
 [1,
  pd.Timestamp('2021-04-10 17:23:59'),
  pd.Timestamp('2021-10-09 05:30:00'),
  pd.Timestamp('2022-04-09 05:30:00')],
 [2,
  pd.Timestamp('2020-12-22 18:55:42.736000'),
  pd.Timestamp('2020-12-22 18:55:43'),
  pd.Timestamp('2022-02-16 00:25:42.746000')],
 [2,
  pd.Timestamp('2021-04-05 14:12:00.130000'),
  pd.Timestamp('2022-02-16 05:30:00'),
  pd.Timestamp('2023-02-16 05:30:00')],
 [2,
  pd.Timestamp('2021-04-27 23:41:53.381000'),
  pd.Timestamp('2023-02-17 05:30:00'),
  pd.Timestamp('2023-08-17 05:30:00')],
 [2,
  pd.Timestamp('2023-10-29 23:41:53'),
  pd.Timestamp('2023-10-29 05:30:00'),
  pd.Timestamp('2024-08-17 05:30:00')]], columns= ['user_id', 'subscription_date_time', 'start_date', 'end_date'])

I have this problem at hand to identify the continous plan times. So a user can subscribe to a plan anytime for x number of days. Each time he subscribes new record is created with following cases

start date of new record = end_date + 1 of old record (If he subscribes within the end_date of old plan. So basically it is just an extension of old plan)
start date of new record = current_date (If there is no active plan)

Now I need to combine all the plans of case 1 and get single start time and end time. How can i achieve this ? I am using python. Any logic would help me.

Things I tried:

In the current image there are three continous plans for a user. Say if it is two i was able to achieve it by the time difference of new_start_date - old_end_date == 1. But if there are more than two continous subscriptions, I am stuck on how to combine first and third. Another challenge is if he has a case 2 also then it should be considered completely new.

Expeted Outcome:

df = pd.DataFrame([[1,
  pd.Timestamp('2020-12-05 19:04:51.878000'),
  pd.Timestamp('2020-12-05 19:04:51.946000'),
  pd.Timestamp('2022-04-09 05:30:00')],
 [2,
  pd.Timestamp('2020-12-22 18:55:42.736000'),
  pd.Timestamp('2020-12-22 18:55:43'),
  pd.Timestamp('2023-08-17 05:30:00')],
 [2,
  pd.Timestamp('2023-10-29 23:41:53'),
  pd.Timestamp('2023-10-29 05:30:00'),
  pd.Timestamp('2024-08-17 05:30:00')]], columns= ['user_id', 'subscription_date_time', 'start_date', 'end_date'])

Note : You can add random unique ids to each row in the original dataframe if required.

Please edit your question to include copyable piece of code that can be used for testing rather than an image. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). — Henry Ecker, May 01 '21 at 06:04
How come for user #1, the 8th April isn't considered as different subscription because time gap is 3 days — Shubham Periwal, May 01 '21 at 07:53

Group by and Finding Continous subscription plans Python

0 Answers0