I have the following table and I need to re-calculate the StartDate
based on some business rules that I was given. Specifically, when the StartDate
of the succeeding session is within one calendar day (i.e., < 2 days) the sessions are combined, using the earlier StartDate
and the later EndDate
.
StartDate | EndDate | DaysBetweenSessions |
---|---|---|
2011-04-15 | 2011-04-21 | 3 |
2011-04-24 | 2011-04-27 | 3 |
2011-04-30 | 2011-05-01 | 0 |
2011-05-01 | 2011-05-06 | 5 |
2011-05-11 | 2011-05-15 | 1 |
2011-05-16 | 2011-05-17 | 0 |
2011-05-17 | 2011-05-25 | 10 |
What I need as a result:
StartDate | EndDate | DaysBetweenSessions | NewStartDate | NewEndDate |
---|---|---|---|---|
2011-04-15 | 2011-04-21 | 3 | 2011-04-15 | 2011-04-21 |
2011-04-24 | 2011-04-27 | 3 | 2011-04-24 | 2011-04-27 |
2011-04-30 | 2011-05-01 | 0 | 2011-04-30 | 2011-05-06 |
2011-05-01 | 2011-05-06 | 5 | 2011-04-30 | 2011-05-06 |
2011-05-11 | 2011-05-15 | 1 | 2011-05-11 | 2011-05-25 |
2011-05-16 | 2011-05-17 | 0 | 2011-05-11 | 2011-05-25 |
2011-05-17 | 2011-05-25 | 10 | 2011-05-11 | 2011-05-25 |
I have attempted to use df.shift
to get the PreviousStartDate, PreviousEndDate, NextStartDate, NextEndDate
, and then tried to loop through the dataset with df.iterrows()
, calculating the new dates, but I run into a technical issue that I don't fully understand - while in the loop that was used to update a value, I cannot use an operator on that new value - the new value hasn't been written to the df yet.