1
df = pd.DataFrame([[1,
  pd.Timestamp('2020-12-05 19:04:51.878000'),
  pd.Timestamp('2020-12-05 19:04:51.946000'),
  pd.Timestamp('2021-04-08 00:34:51.946000')],
 [1,
  pd.Timestamp('2021-04-08 17:22:20.476000'),
  pd.Timestamp('2021-04-09 05:30:00'),
  pd.Timestamp('2021-10-08 05:30:00')],
 [1,
  pd.Timestamp('2021-04-10 17:23:59'),
  pd.Timestamp('2021-10-09 05:30:00'),
  pd.Timestamp('2022-04-09 05:30:00')],
 [2,
  pd.Timestamp('2020-12-22 18:55:42.736000'),
  pd.Timestamp('2020-12-22 18:55:43'),
  pd.Timestamp('2022-02-16 00:25:42.746000')],
 [2,
  pd.Timestamp('2021-04-05 14:12:00.130000'),
  pd.Timestamp('2022-02-16 05:30:00'),
  pd.Timestamp('2023-02-16 05:30:00')],
 [2,
  pd.Timestamp('2021-04-27 23:41:53.381000'),
  pd.Timestamp('2023-02-17 05:30:00'),
  pd.Timestamp('2023-08-17 05:30:00')],
 [2,
  pd.Timestamp('2023-10-29 23:41:53'),
  pd.Timestamp('2023-10-29 05:30:00'),
  pd.Timestamp('2024-08-17 05:30:00')]], columns= ['user_id', 'subscription_date_time', 'start_date', 'end_date'])

I have this problem at hand to identify the continous plan times. So a user can subscribe to a plan anytime for x number of days. Each time he subscribes new record is created with following cases

  1. start date of new record = end_date + 1 of old record (If he subscribes within the end_date of old plan. So basically it is just an extension of old plan)
  2. start date of new record = current_date (If there is no active plan)

Now I need to combine all the plans of case 1 and get single start time and end time. How can i achieve this ? I am using python. Any logic would help me.

Things I tried:

  1. In the current image there are three continous plans for a user. Say if it is two i was able to achieve it by the time difference of new_start_date - old_end_date == 1. But if there are more than two continous subscriptions, I am stuck on how to combine first and third. Another challenge is if he has a case 2 also then it should be considered completely new.

Expeted Outcome:

df = pd.DataFrame([[1,
  pd.Timestamp('2020-12-05 19:04:51.878000'),
  pd.Timestamp('2020-12-05 19:04:51.946000'),
  pd.Timestamp('2022-04-09 05:30:00')],
 [2,
  pd.Timestamp('2020-12-22 18:55:42.736000'),
  pd.Timestamp('2020-12-22 18:55:43'),
  pd.Timestamp('2023-08-17 05:30:00')],
 [2,
  pd.Timestamp('2023-10-29 23:41:53'),
  pd.Timestamp('2023-10-29 05:30:00'),
  pd.Timestamp('2024-08-17 05:30:00')]], columns= ['user_id', 'subscription_date_time', 'start_date', 'end_date'])

Note : You can add random unique ids to each row in the original dataframe if required.

Karanam Krishna
  • 365
  • 2
  • 16
  • Please edit your question to include copyable piece of code that can be used for testing rather than an image. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 01 '21 at 06:04
  • Can you add your expected outcome? – divingTobi May 01 '21 at 06:17
  • How come for user #1, the 8th April isn't considered as different subscription because time gap is 3 days – Shubham Periwal May 01 '21 at 07:53
  • Sorry it was a typo error. Edited it. – Karanam Krishna May 01 '21 at 08:29

0 Answers0