6

Novice Python user here - I'm attempting to Calculate the Business Hours between two dates in a pandas DataFrame given 9am-5pm, Mon-Fri Working Hours and to exclude Australian Public Holidays.

I have tried to hack together a lot of solutions over the past few days and apply it to my problem but I'm having significant trouble.

I will post my current iteration but also looking for feedback as the best way to handle this overall and to gain some understanding of how to tackle these problems in the future.

My lastest attempt is using pandas CDay then creating a custom holiday calendar for Australian dates which all seems to be working - it's then going from this step to applying it to the pandas dates which I am having trouble understanding. I am using a custom function from this https://codereview.stackexchange.com/questions/135142/calculate-working-minutes-between-two-timestamps/135200#135200 solution to count the minutes between the dates but having no luck.

Appreciate any help!

import datetime
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar
from pandas.tseries.offsets import CDay

class HolidayCalendar(AbstractHolidayCalendar):
    rules =[Holiday('New Years Day',year=2016,month=1,day=1),
        Holiday('Australia Day',year=2016,month=1,day=26),
        Holiday('Good Friday',year=2016,month=3,day=25),
        Holiday('Easter Monday',year=2016,month=3,day=28),
        Holiday('ANZAC Day',year=2016,month=4,day=25),
        Holiday('Queens Birthday',year=2016,month=6,day=13),
        Holiday('Christmas Day',year=2016,month=12,day=25),
        Holiday('Boxing Day',year=2016,month=12,day=26),           
        Holiday('New Years Day',year=2017,month=1,day=1),
        Holiday('Australia Day',year=2017,month=1,day=26),
        Holiday('Good Friday',year=2017,month=4,day=15),
        Holiday('Easter Monday',year=2017,month=4,day=17),
        Holiday('ANZAC Day',year=2017,month=4,day=25),
        Holiday('Queens Birthday',year=2017,month=6,day=12),
        Holiday('Christmas Day',year=2017,month=12,day=25),
        Holiday('Boxing Day',year=2017,month=12,day=26),
        Holiday('New Years Day',year=2018,month=1,day=1),
        Holiday('Australia Day',year=2018,month=1,day=26),
        Holiday('Good Friday',year=2018,month=3,day=30),
        Holiday('Easter Monday',year=2018,month=4,day=2),
        Holiday('ANZAC Day',year=2018,month=4,day=25),
        Holiday('Queens Birthday',year=2018,month=6,day=11),
        Holiday('Christmas Day',year=2018,month=12,day=25),
        Holiday('Boxing Day',year=2018,month=12,day=26)]

cal = HolidayCalendar()
dayindex = pd.bdate_range(datetime.date(2015,1,1),datetime.date.today(),freq=CDay(calendar=cal))

day_series = dayindex.to_series()

def count_mins(start,end):

starttime = datetime.datetime.fromtimestamp(int(start)/1000)

endtime = datetime.datetime.fromtimestamp(int(end)/1000)

days = day_series[starttime.date():endtime.date()]

daycount = len(days)

if daycount == 0:
    return daycount
else:
    startday = datetime.datetime(days[0].year,
                             days[0].month,
                             days[0].day,
                             hour=9,
                             minute=0)
    endday = datetime.datetime(days[-1].year,
                           days[-1].month,
                           days[-1].day,
                           hour=17,
                           minute=0)
    if daycount == 1:  

        if starttime < startday:
            periodstart = startday
        else:
            periodstart = starttime
        if endtime > endday:
            periodend = endday
        else:
            periodend = endtime

        return (periodend - periodstart).seconds/60

    if daycount == 2:

        if starttime < startday:
            first_day_mins = 480
        else:
            first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
        if endtime > endday:
            second_day_mins = 480
        else:
            second_day_mins = (endtime-endday.replace(hour=9)).seconds/60

        return (first_day_mins + second_day_mins)

    else:

        if starttime < startday:
            first_day_mins = 480
        else:
            first_day_mins = (startday.replace(hour=17)-starttime).seconds/60
        if endtime > endday:
            second_day_mins = 480
        else:
            second_day_mins = (endtime-endday.replace(hour=9)).seconds/60

        return (first_day_mins + second_day_mins + ((daycount-2)*480))


df_updated['Created Date'] = pd.to_datetime(df_updated['Created Date'])
df_updated['Updated Date'] = pd.to_datetime(df_updated['Updated Date'])
df_updated['Created Date'] = df_updated['Created Date'].astype(np.int64) / 
int(1e6)
df_updated['Updated Date'] = df_updated['Updated Date'].astype(np.int64) / 
int(1e6)

count_mins(df_updated['Created Date'], df_updated['Updated Date'])
PeterDS
  • 61
  • 1
  • 2

3 Answers3

6

Try out this package called business-duration in PyPi

pip install business-duration

Example Code:

from business_duration import businessDuration
import pandas as pd
from datetime import time,datetime
import holidays as pyholidays

startdate = pd.to_datetime('2017-01-01 00:00:00')
enddate = pd.to_datetime('2017-01-31 23:00:00')

starttime=time(9,0,0)
endtime=time(17,0,0)

holidaylist = pyholidays.Australia()
unit='hour'

#By default weekends are Saturday and Sunday
print(businessDuration(startdate,enddate,starttime,endtime,holidaylist=holidayli
st,unit=unit))

Output: 160.0

holidaylist:
{datetime.date(2017, 1, 1): "New Year's Day",
 datetime.date(2017, 1, 2): "New Year's Day (Observed)",
 datetime.date(2017, 1, 26): 'Australia Day',
 datetime.date(2017, 3, 6): 'Canberra Day',
 datetime.date(2017, 4, 14): 'Good Friday',
 datetime.date(2017, 4, 15): 'Easter Saturday',
 datetime.date(2017, 4, 17): 'Easter Monday',
 datetime.date(2017, 4, 25): 'Anzac Day',
 datetime.date(2017, 6, 12): "Queen's Birthday",
 datetime.date(2017, 9, 26): 'Family & Community Day',
 datetime.date(2017, 10, 2): 'Labour Day',
 datetime.date(2017, 12, 25): 'Christmas Day',
 datetime.date(2017, 12, 26): 'Boxing Day'}
Gnaneshwar G
  • 111
  • 1
  • 4
0

You could use the length of bdate_range:

In [11]: pd.bdate_range('2017-01-01', '2017-10-23')
Out[11]:
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
               '2017-01-06', '2017-01-09', '2017-01-10', '2017-01-11',
               '2017-01-12', '2017-01-13',
               ...
               '2017-10-10', '2017-10-11', '2017-10-12', '2017-10-13',
               '2017-10-16', '2017-10-17', '2017-10-18', '2017-10-19',
               '2017-10-20', '2017-10-23'],
              dtype='datetime64[ns]', length=211, freq='B')

In [12]: len(pd.bdate_range('2017-01-01', '2017-10-23'))
Out[12]: 211
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • or, in your case, `freq=CDay(calendar=cal)` – Andy Hayden Oct 23 '17 at 23:34
  • Appreciate the reply Andy. I know this is novice but I'm experiencing two problems with this. Firstly, I realise my calendar only has the holiday dates excluded.. I also need to exclude weekends, any ideas how to combine this? Also, I am getting an error substituting my date series for your strings in bdate_range, what/how do I get it in the correct format to parse? Finally - This all calculates business days - how much further to setup the hours between the two dates exluding holidays/weekends – PeterDS Oct 24 '17 at 04:20
  • 1
    @PeterDS The weekends are excluded already in your code. The thing to note is you shouldn't need to specify the year in the holidays, but describe them as in the docs: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-holiday. You can do business hours already with `bhour_us = pd.offsets.CustomBusinessHour(calendar=cal)` :) see https://pandas.pydata.org/pandas-docs/stable/timeseries.html#custom-business-hour – Andy Hayden Oct 24 '17 at 04:33
0

I suggest a simpler solution

import pandas as pd  
from datetime import datetime

weekmask = 'Sun Mon Tue Wed Thu'
exclude = [pd.datetime(2020, 5, 1),
           pd.datetime(2020, 5, 2),
           pd.datetime(2020, 5, 3)]

pd.bdate_range('2020/4/30','2020/5/26',
               freq='C',
               weekmask = weekmask,
               holidays=exclude)
Wojciech Moszczyński
  • 2,893
  • 21
  • 27