3

I want to calculate the number of business days between two dates and create a new pandas dataframe column with those days. I also have a holiday calendar and I want to exclude dates in the holiday calendar while making my calculation.

I looked around and I saw the numpy busday_count function as a useful tool for it. The function counts the number of business days between two dates and also allows you to include a holiday calendar.

I also looked around and I saw the holidays package which gives me the holiday dates for different countries. I thought it will be great to add this holiday calendar into the numpy function.

Then I proceeded as follows;

import pandas as pd
import numpy as np
import holidays
from datetime import datetime, timedelta, date

df = {'start' : ['2019-01-02', '2019-02-01'],
       'end' : ['2020-01-04', '2020-03-05']
    }
df = pd.DataFrame(df)

holidays_country = holidays.CountryHoliday('UnitedKingdom')
start_date = [d.date for d in df['start']]
end_date = [d.date for d in df['end']]
holidays_numpy = holidays_country[start_date:end_date]

df['business_days'] = np.busday_count(begindates = start_date,
                                      enddates = end_date,
                                      holidays=holidays_numpy)

When I run this code, it throws this error TypeError: Cannot convert type '<class 'list'>' to date

When I looked further, I noticed that the start_date and end_date are lists and that might be whey the error was occuring.

I then changed the holidays_numpy variable to holidays_numpy = holidays_country['2019-01-01':'2019-12-31'] and it worked.

However, since my dates are different for each row in my dataframe, is there a way to set the two arguments in my holiday_numpy variable to select corresponding values (just like the zip function) each from start_date and end_date?

I'm also open to alternative ways of solving this problem.

datanerd
  • 43
  • 3

1 Answers1

0

This should work:

import pandas as pd
import numpy as np
import holidays 

df = {'start' : ['2019-01-02', '2019-02-01'],
       'end' : ['2020-01-04', '2020-03-05']}
df = pd.DataFrame(df)

holidays_country = holidays.CountryHoliday('UK')

def f(x):
  return np.busday_count(x[0],x[1],holidays=holidays_country[x[0]:x[1]])

df['business_days'] = df[['start','end']].apply(f,axis=1)

df.head()
Partha Mandal
  • 1,391
  • 8
  • 14