9

Is there a better / more direct way to calculate this than the following?

# 1. Set up the start and end date for which you want to calculate the      
# number of business days excluding holidays.

start_date = '01JAN1986'
end_date = '31DEC1987'
start_date = datetime.datetime.strptime(start_date, '%d%b%Y')
end_date = datetime.datetime.strptime(end_date, '%d%b%Y')

# 2. Generate a list of holidays over this period
from pandas.tseries.holiday import USFederalHolidayCalendar
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start_date, end_date)
holidays

Which gives a pandas.tseries.index.DatetimeIndex

DatetimeIndex(['1986-01-01', '1986-01-20', '1986-02-17', '1986-05-26',
           '1986-07-04', '1986-09-01', '1986-10-13', '1986-11-11',
           '1986-11-27', '1986-12-25', '1987-01-01', '1987-01-19',
           '1987-02-16', '1987-05-25', '1987-07-03', '1987-09-07',
           '1987-10-12', '1987-11-11', '1987-11-26', '1987-12-25'],
          dtype='datetime64[ns]', freq=None, tz=None)

But you need a list for numpy busday_count

holiday_date_list = holidays.date.tolist()

Then with and without the holidays you get:

np.busday_count(start_date.date(), end_date.date()) 
>>> 521

np.busday_count(start_date.date(), end_date.date(), holidays = holiday_date_list)
>>> 501

There are some other questions slightly similar but generally working with pandas Series or Dataframes (Get business days between start and end date using pandas, Counting the business days between two series)

Community
  • 1
  • 1
InterwebIsGreat
  • 171
  • 1
  • 5
  • 1
    Looks pretty good to me. Are you looking for a way that is better performance-wise? Or like one that only uses the standard library or what? – derricw May 16 '15 at 03:40
  • Performance isn't an issue but I just thought I was going around the houses a little bit to get the result and perhaps it might be doable directly with Pandas or Numpy, or something else ... – InterwebIsGreat May 16 '15 at 20:23
  • It's not clear what your question is. The calendar implementation is straight forward (import the module, create a USFederalHolidayCalendar object. and then calculate holidays between the start and end dates). Seems like half of your code above is initializing and converting the start and end dates. – Alexander May 17 '15 at 01:49
  • Sorry if the questions not clear, any suggested edits welcome. Perhaps it is better posed as 'How to calculate the number of business days between two datetimes excluding holidays using Pandas?'. Although I didn't want to limit it to Pandas directly. If this is the most direct way then fair enough - just thought there might be a neater way. – InterwebIsGreat May 17 '15 at 13:12

1 Answers1

1

If you put the index you created in a dataframe, you can use resample to fill in the gaps. The offset passed to .resample() can include things like business days and even (custom) calendars:

from pandas.tseries.holiday import USFederalHolidayCalendar

C = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())

start_date = '01JAN1986'
end_date = '31DEC1987'

(
pd.DataFrame(index=pd.to_datetime([start_date, end_date]))
    .resample(C, closed='right') 
    .asfreq()
    .index  
    .size
) - 1

The size of the index - 1 then gives us the amount of days.

gosuto
  • 5,422
  • 6
  • 36
  • 57