0

Given an anchor date and a date offset, how do I roll back the offset to the first valid date before a given target date?

anchor = pd.to_datetime("2021-03-19")
target = pd.to_datetime("2021-03-04")
offset = pd.DateOffset(weeks=2)
# something like this... resulting in "2021-02-19"
first_before = offset.rollback(anchor, target)
# Want to point out that rollback is functional with most other offsets, such
# as pd.offsets.Week(...).rollback(...). However, rolling back a DateOffset
# returns the same input date (because there is no anchor).

There's the whole convert DateOffset to Timedelta and do the math yourself thing. Simplified below to ignore the possibility of the anchor date occurring before the target date.

first_before = anchor - math.ceil((anchor-target)/pd.Timedelta(weeks=2)) * offset

But Timedelta doesn't ignore daylight savings so the result might be off +- 1 hour. Also DateOffset is the only offset that can't be directly converted to Timedelta and with various equivalent specifications, is a little tricky to convert manually.

o1 = pd.offsets.DateOffset(weeks=2)
o2 = pd.offsets.DateOffset(days=14)
# Both raise ValueError:
pd.Timedelta(o1)
pd.to_timedelta(o1)
# Both raise AtttributeError:
o1.days
o2.weeks

But finally I feel there must be a simpler, built-in approach of which I'm unaware.


edit: Demonstrating an inefficient approach.

Given the following date range, I want to find the first date that is less than or equal to "2021-03-04" (the target).

# end is anchor, freq is offset
>>> dr = pd.date_range(freq=offset, end="2021-03-19", periods=20)
>>> dr
DatetimeIndex(['2020-06-26', '2020-07-10', '2020-07-24', '2020-08-07',
               '2020-08-21', '2020-09-04', '2020-09-18', '2020-10-02',
               '2020-10-16', '2020-10-30', '2020-11-13', '2020-11-27',
               '2020-12-11', '2020-12-25', '2021-01-08', '2021-01-22',
               '2021-02-05', '2021-02-19', '2021-03-05', '2021-03-19'],
              dtype='datetime64[ns]', freq='<DateOffset: weeks=2>')
# key is target
>>> dr[dr.get_loc(key="2021-03-04", method="ffill")]
Timestamp('2021-02-19 00:00:00', freq='<DateOffset: weeks=2>')
user19087
  • 1,899
  • 1
  • 16
  • 21
  • If your anchor is `2021-03-19` with a target date of `2021-03-04` with an offset of `2 weeks`, why is the expected answer `2021-02-19` and not `2021-02-18` or `2021-02-25`? Could you re-explain how anchor and target work? – aneroid Mar 20 '21 at 13:05
  • @user19087 As far as i know there isn't any inbuilt function in pandas that does rollback for you but i guess this can be simply achieved with the help of simple for/while loop and incorporating a rollback logic in that.. – Shubham Sharma Mar 20 '21 at 14:04
  • If there is no built-in function I would prefer a direct calculation that ignores daylight savings (i.e. doesn't shift the time) over a loop (time-inefficient) or datetime index (space-inefficient). – user19087 Mar 20 '21 at 14:14

1 Answers1

0

It turns out that pd.DateOffset is backed by dateutil.relativedelta. As explained in this answer, a relativedelta can't be converted to a timedelta. For similar reasons, absolute calculations involving relative delta don't make any sense. If the relative aspect of the offset is in use, the best solution is to apply the offset in a loop until the conditions are met (as per an OP comment).

One might assume that every offset is timezone-agnostic and use instead pd.offsets.Day. However that class is backed by pandas Timestamp, which is why it can be directly converted to a Timestamp. The only possible solution is to first temporarily remove any timezone information:

offset = pd.Timedelta(days=14)
multiplier = math.ceil((anchor.tz_localize(None)-target.tz_localize(None))/offset)
first_before = anchor.tz_localize(None) - multiplier*offset
first_before = first_before.tz_localize(anchor.tz)

You can see that pd.date_range takes the same approach.

user19087
  • 1,899
  • 1
  • 16
  • 21