1

Situation: Calculating number of hours that lies in a weekday and number of hours that lies in a weekend/holiday from a time interval. Data is extracted from a excel file and put into a dataframe using pandas.

Example of Dataframe Input:
                 Start                      End
      1     06-09-2018 02:00:00      08-09-2018 01:00:00 
      2     08-09-2018 02:00:00      08-09-2018 04:00:00 
      3     08-09-2018 02:00:00      10-09-2018 01:00:00 

Example of Dataframe Output at the end:
                Start                      End                 WD/WE
      1     06-09-2018 02:00:00      08-09-2018 01:00:00        WD
      2     08-09-2018 02:00:00      08-09-2018 04:00:00        WE
      3     08-09-2018 02:00:00      10-09-2018 01:00:00        WE

From 1st row, using the code, we calculate the number of hours that lies on a weekday and a weekend/holiday. Manually calculating, we will get 46 weekday hours and 1 weekend/holiday hours. Thus, weekend/holiday hours <= weekday hours, thus returning a string "WD".

I am currently trying to use workdays, pandas, datetime, openpyxl.

data_check = pd.read_excel('C:\\excel path.xlsx',header=0)
df_check = pd.DataFrame(data_check,columns=['Start Date and Time','End Date and Time'])
#Creating dataframe
df_check['Start Date and Time'] = pd.to_datetime(df_check['Start Date and Time'])
df_check['End Date and Time'] = pd.to_datetime(df_check['End Date and Time'])

#Simplifying dataframe names
df_s= pd.DataFrame({'Start Date and Time': df_check['Start Date and Time']})
df_u= pd.DataFrame({'End Date and Time': df_check['End Date and Time']})

#Indicating holidays
holidays= ['22-08-2018','06-11-2018','25-12-2018']

I understand my code is a little messy, as I am new to Python and still trying to get things to work. I got stuck after trying to write a code to run through my dataframe to calculate no. of weekday hours and no. of weekend/holiday hours. I have searched and read up on how to detect if a day is weekday or weekend/holiday, but I have no idea on how to run it through my dataframe.

Hopefully someone is able to help.

Tested so far: Code1: Not working. AttributeError: 'DataFrame' object has no attribute 'calendar'. I might be typing the code wrongly.

df_wdwe1 = np.where(df_s.calendar.weekday(year, month, day)>=6, "Weekends","Weekdays")
Nukyi
  • 51
  • 5

1 Answers1

1

Well you could use the datetime library to transform the string into date value then is just a matter of confronting the data to see the min and max, when you get these just make a subtraction and you will get the time elapsed.

You can use min() and max() to get the min and max like in this example:

Find oldest/youngest datetime object in a list

Here is an example on how to use datetime :

Converting string into datetime

For identifying the day of the week based on the date provided you could you calendar lib more specifically the function calendar.weekday(year, month, day) which returns numbers from '0' to '7' where 0 is Monday and 7 is Sunday

As for a holiday I wouldn't know, I don't think there is something as precise as that in calendar, considering holidays are country based. You could check some custom libraries on github.

  • I think my main problem is identifying if the Start Date or End Date is a weekday or weekend/holiday. I have successfully did this through an excel file with multiple formulas. However, openpyxl does not allow me to copy the values of my resulted columns (which consist of formulas) to another workbook. – Nukyi Sep 06 '18 at 09:22
  • I have also converted them into date value. I would want to do a loop through around 7000 rows of data on excel to test if each individual row is more of a weekday or weekend. – Nukyi Sep 06 '18 at 09:25
  • I updated my answer adding how to identify the day of the week – Alexandru Martalogu Sep 06 '18 at 09:28
  • Following your edit, I have included the following: df_wdwe = np.where(df_s.calendar.weekday(year, month, day)>=6, "Weekends","Weekdays") However, I would like to include a custom holidays as such: holidays= ['15-06-2018', '09-08-2018','22-08-2018','06-11-2018','25-12-2018'] – Nukyi Sep 06 '18 at 09:33
  • I have tested df_wdwe = np.where(df_s.calendar.weekday(year, month, day)>=6, "Weekends","Weekdays") And it does not work – Nukyi Sep 06 '18 at 09:38
  • In that case try to extract only date and not time from the datetime value and compare it to the list of holiday. – Alexandru Martalogu Sep 06 '18 at 09:38
  • Well you have to extract the year, month and day from the datetime value first, you are not providing any input to the weekday function this way. – Alexandru Martalogu Sep 06 '18 at 09:41
  • I have changed the format of my datetime to date only. printing the dataframe out shows me the data only. But the same error occurs with AttributeError: 'Series' object has no attribute 'calendar'. – Nukyi Sep 07 '18 at 02:29
  • This is because you are trying to use calendar as part of pandas, I don't belive pandas has calendar integrated. `df_check['Start Date and Time'] = pd.to_datetime(df_check['Start Date and Time'])` you store the data to the df_check, from df_check extract year, month and day into 3 variables, after that use `day_of_week = calendar.weekday(year, month, day)` – Alexandru Martalogu Sep 07 '18 at 07:30
  • Is it possible to put the day_of_week back into a dataframe? – Nukyi Sep 07 '18 at 09:34