Updating date values based on number of days between startdate and enddate

Question

I have the following table and I need to re-calculate the StartDate based on some business rules that I was given. Specifically, when the StartDate of the succeeding session is within one calendar day (i.e., < 2 days) the sessions are combined, using the earlier StartDate and the later EndDate.

StartDate	EndDate	DaysBetweenSessions
2011-04-15	2011-04-21	3
2011-04-24	2011-04-27	3
2011-04-30	2011-05-01	0
2011-05-01	2011-05-06	5
2011-05-11	2011-05-15	1
2011-05-16	2011-05-17	0
2011-05-17	2011-05-25	10

What I need as a result:

StartDate	EndDate	DaysBetweenSessions	NewStartDate	NewEndDate
2011-04-15	2011-04-21	3	2011-04-15	2011-04-21
2011-04-24	2011-04-27	3	2011-04-24	2011-04-27
2011-04-30	2011-05-01	0	2011-04-30	2011-05-06
2011-05-01	2011-05-06	5	2011-04-30	2011-05-06
2011-05-11	2011-05-15	1	2011-05-11	2011-05-25
2011-05-16	2011-05-17	0	2011-05-11	2011-05-25
2011-05-17	2011-05-25	10	2011-05-11	2011-05-25

I have attempted to use df.shift to get the PreviousStartDate, PreviousEndDate, NextStartDate, NextEndDate, and then tried to loop through the dataset with df.iterrows(), calculating the new dates, but I run into a technical issue that I don't fully understand - while in the loop that was used to update a value, I cannot use an operator on that new value - the new value hasn't been written to the df yet.

I'm looking for fewer than 2 days - question updated, thanks for pointing that out. — adm-gis, Sep 07 '21 at 21:41

score 1 · Accepted Answer · answered Sep 07 '21 at 22:05

Try with groupby and tranform:

#convert to datetime if necessary
df["StartDate"] = pd.to_datetime(df["StartDate"])
df["EndDate"] = pd.to_datetime(df["EndDate"])

df["NewStartDate"] = df.groupby((df["DaysBetweenSessions"]>=2).cumsum().shift().fillna(0))["StartDate"].transform("min")
df["NewEndDate"] = df.groupby((df["DaysBetweenSessions"]>=2).cumsum().shift().fillna(0))["EndDate"].transform("max")

>>> df
  StartDate    EndDate  DaysBetweenSessions NewStartDate NewEndDate
0 2011-04-15 2011-04-21                    3   2011-04-15 2011-04-21
1 2011-04-24 2011-04-27                    3   2011-04-24 2011-04-27
2 2011-04-30 2011-05-01                    0   2011-04-30 2011-05-06
3 2011-05-01 2011-05-06                    5   2011-04-30 2011-05-06
4 2011-05-11 2011-05-15                    1   2011-05-11 2011-05-25
5 2011-05-16 2011-05-17                    0   2011-05-11 2011-05-25
6 2011-05-17 2011-05-25                   10   2011-05-11 2011-05-25

Thank you so much. This definitely seems to work. Can you explain the purpose of the ```shift()``` in this case? I'm a bit confused. — adm-gis, Sep 07 '21 at 22:32
Just run `df["DaysBetweenSessions"]>=2).cumsum()` and see the output. It doesn't align with the groups you want. — not_speshal, Sep 08 '21 at 12:21

Updating date values based on number of days between startdate and enddate

1 Answers1

Linked