6

I have a dataframe with dates as seen in the table below. 1st block is what it should look like and the 2nd block is what I get when just adding the BDays. This is an example of what it should look like when completed. I want to use the 1st column and add 5 business days to the dates, but if the 5 Bdays overlaps a holiday (like 15 Feb'21) then I need to add one additional day. It is fairly simple to add the 5Bday using pandas.tseries.offsets import BDay, but i cannot skip the holidays while using the dataframe.

I have tried to use pandas.tseries.holiday import USFederalHolidayCalendar, the workdays and workalendar modules, but cannot figure it out. Anyone have an idea what I can do.

Correct Example

DATE EXIT DATE +5
2021/02/09 2021/02/17
2021/02/10 2021/02/18

Wrong Example

DATE EXIT DATE +5
2021/02/09 2021/02/16
2021/02/10 2021/02/17

Here are some examples of code I tried:

import pandas as pd
from workdays import workday
...
df['DATE'] = workday(df['EXIT DATE +5'], days=5, holidays=holidays)

Next Example:

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
dt = df['DATE']
df['EXIT DATE +5'] = dt + bday_us

=========================================

Final code:

Below is the code I finally settled on. I had to define the holidays manually due to the days the NYSE actually trades. Like for instance the day Pres Bush was laid to rest.

import datetime as dt
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay

from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
    USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
    USLaborDay, USThanksgivingDay

class USTradingCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
        USMartinLutherKingJr,
        USPresidentsDay,
        GoodFriday,
        USMemorialDay,
        Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
        Holiday('BushDay', year=2018, month=12, day=5),
        USLaborDay,
        USThanksgivingDay,
        Holiday('Christmas', month=12, day=25, observance=nearest_workday)
    ]

offset = 5

df = pd.DataFrame(['2019-10-11', '2019-10-14', '2017-04-13', '2018-11-28', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])

def offset_date(start, offset):
  return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USTradingCalendar())

df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)
print(df)
dps
  • 139
  • 3
  • 11
  • The posts try to answer my question, but it does not work for a dataframe with dates in it. https://stackoverflow.com/questions/12691551/add-n-business-days-to-a-given-date-ignoring-holidays-and-weekends-in-python and https://stackoverflow.com/questions/59636651/how-to-add-business-days-in-date-excluding-holidays – dps Feb 13 '21 at 08:03
  • Have you solved your issue? – xicocaio Jul 07 '21 at 14:21
  • Not really efficiently. I would like to think there is a way to do it the same way as Excel handles it, but alas I have not found it. Do you know? – dps Jul 08 '21 at 15:09
  • I wrote an answer, is that what you need? – xicocaio Jul 09 '21 at 19:43
  • If you allow me to put in my two cents: your question was fine in the previous version. Avoid adding a final solution to your question text. It [is ok to answer your own questions](https://stackoverflow.blog/2011/07/01/its-ok-to-ask-and-answer-your-own-questions/), but I recommend you follow [these guidelines](https://stackoverflow.com/help/self-answer). – xicocaio Jul 19 '21 at 13:01
  • 1
    + `Holiday('Juneteenth', month=6, day=19, observance=nearest_workday),` – gregV Dec 10 '22 at 04:40

2 Answers2

7

Input data

df = pd.DataFrame(['2021-02-09', '2021-02-10', '2021-06-28', '2021-06-29', '2021-07-02'], columns=['DATE'])
df['DATE'] = pd.to_datetime(df['DATE'])

Suggested solution using apply

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import BDay

def offset_date(start, offset):
  return start + pd.offsets.CustomBusinessDay(n=offset, calendar=USFederalHolidayCalendar())

offset = 5
df['END'] = df.apply(lambda x: offset_date(x['DATE'], offset), axis=1)

    DATE        END
    2021-02-09  2021-02-17
    2021-02-10  2021-02-18
    2021-06-28  2021-07-06
    2021-06-29  2021-07-07
    2021-07-02  2021-07-12

PS: If you want to use a particular calendar such as the NYSE, instead of the default USFederalHolidayCalendar, I recommend following the instructions on this answer, about creating a custom calendar.

Alternative solution which I do not recommend

Currently, to the best of my knowledge, pandas do not support a vectorized approach to your problem. But if you want to follow a similar approach to the one you mentioned, here is what you should do.

First, you will have to define an arbitrary far away end date that includes all the periods you might need and use it to create a list of holidays.

holidays = USFederalHolidayCalendar().holidays(start='2021-02-09', end='2030-02-09')

Then, you pass the holidays list to CustomBusinessDay through the holidays parameter instead of the calendar to generate the desired offset.

offset = 5
bday_us = pd.offsets.CustomBusinessDay(n=offset, holidays=holidays)
df['END'] = df['DATE'] + bday_us

However, this type of approach is not a true vectorized solution, even though it might seem like it. See the following SO answer for further clarification. Under the hood, this approach is probably doing a conversion that is not efficient. This why it yields the following warning.

PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex

xicocaio
  • 867
  • 1
  • 10
  • 27
  • This works too. I would be interested to see if this is possible without applying to df. – fthomson Jul 09 '21 at 00:14
  • @fthomson Just updated my answer. But answering your comment, yes it is possible, but it is probably not a good idea. I mentioned in my answer the small detail on your code that was preventing you from getting the desired results. – xicocaio Jul 09 '21 at 14:09
  • I actually wrote a solution to the problem as well. Didn't know Date offsets weren't vectorized however. Perhaps a slight performance benefit with applymap/map? – fthomson Jul 09 '21 at 18:33
  • @fthomson `Map` is not that different from `apply`, they are both essentially a little bit optimized `for` loops. I don't think you will get much better improvement than that. Glad your solution works. However, if performance is an issue, using date range just to get the last element might not be the best approach. Depending on the size of your dataframe and the range of your dates you might be creating relatively big date ranges for maybe thousands of rows, just to select the last element. In this sense, using the date offset with `apply` will get you faster and less memory intensive results. – xicocaio Jul 09 '21 at 18:54
  • @xicocaio I have found an interesting problem with the suggested solution. Not sure why it is happening. If you change the input data to the following dates ['2021-06-28', '2021-06-29', '2021-07-02'] you will notice that the output for the last value is incorrect. The output date is 2021-07-10 for the last input date value, which is a Saturday. Any idea why this happens? Whereas the alternative solutions does however seem to calculate the correct output date of 2021-07-12, but it does give a warning as mentioned. – dps Jul 18 '21 at 08:15
  • @davidp13 My answer was not working for cases where the offset ended on a Saturday. I fixed it, hope it helps. I also added the examples you mentioned. I suggest you add your examples to the question as well. – xicocaio Jul 18 '21 at 17:17
  • @xicocaio I added my final code. Please review and comment if necessary, but I think this should work fine. Thank you very much for your guidance and help. – dps Jul 19 '21 at 09:46
  • @davidp13 Glad to help. However, in your original question, it was not mentioned that the objective was to deal with NYSE issues, and so my answer did not aim at addressing the details on this specific calendar. As it seems that my solution addressed your original question, can you please accept my answer? If you believe that my answer should encompass details about the NYSE calendar before being accepted, please add these requirements/constraints to your question so that I can adjust my answer accordingly. – xicocaio Jul 19 '21 at 11:35
  • @xicocaio your answer covered my original question. At the time of writing the question I was not aware that the `USFederalHolidayCalendar` did not cater for the NYSE trading days. Again thank you and I accepted your answer. – dps Jul 19 '21 at 12:15
  • 1
    @davidp13 Thank you very much. Also, this issue of calendar has been addressed in other questions, particularly [this one](https://stackoverflow.com/a/36525605/2464671). PS: My master's thesis is on the trading domain, so I had my fair share of troubles with trading dates LOL. – xicocaio Jul 19 '21 at 12:43
0

Here's one way to do it

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from datetime import timedelta as td

def get_exit_date(date):
    holiday_list = cals.holidays(start=date, end=date + td(weeks=2)).tolist()
    # 6 periods since start date is included in set
    n_bdays = pd.bdate_range(start=date, periods=6, freq='C', holidays=holiday_list)
    return n_bdays[-1]

df = pd.read_clipboard()
cals = USFederalHolidayCalendar()
# I would convert this to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['EXIT DATE +5'] = df['DATE'].apply(get_exit_date)

this is using bdate_range which returns a datetime index

Results:

    DATE    EXIT DATE +5
0   2021-02-09  2021-02-17
1   2021-02-10  2021-02-18

Another option is instead of dynamically creating the holiday list. You could also just choose a start date and leave it outside the function like so:

def get_exit_date(date):
    # 6 periods since start date is included in set
    n_bdays = pd.bdate_range(start=date, periods=6, freq='C', holidays=holiday_list)
    return n_bdays[-1]

df = pd.read_clipboard()
cals = USFederalHolidayCalendar()
holiday_list = cals.holidays(start='2021-01-01').tolist()
# I would convert this to datetime
df['DATE'] = pd.to_datetime(df['DATE'])
df['EXIT DATE +5'] = df['DATE'].apply(get_exit_date)
fthomson
  • 773
  • 3
  • 9