0

I have a pandas dataframe, it has a column with date (datetime) and a column with day of the week (Mon-Sun). I would like to create a new column that would show whether it's a weekday, weekend or a holiday. Including a snapshot of dataframe for reference.

This is the function I wrote, I'm listing the holidays one by one since there are only 6.

def day_type (day_month, day_week):
    if (day_month == '2013-01-01' or day_month == '2013-05-27' or day_month == '2013-07-04' \
        or day_month == '2013-09-02' or day_month == '2013-11-28' or day_month == '2013-12-25'):
        return "H"
    elif day_week == "Saturday" or day_week =="Sunday":
        return "WE"
    else:
        return "WD"

I am then executing the function with apply where LoadData['Date'] is my day_month and LoadData['DoW'] is my day_week.

LoadData["DayType"] = LoadData[['Date','DoW']].apply(day_type)

I get the following error:

TypeError: day_type() missing 1 required positional argument: 'day_week'

I have tried using lambda as well to apply the function but I got an error as well:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I must be missing something; I've tried this in many different ways and still not getting anywhere.

  • `LoadData.apply(lambda x: day_type(x["Date"],x["DoW"]),axis=1)`? – not_speshal Aug 23 '23 at 14:35
  • 1
    Welcome to SO! The datetime module has all that. **Weekday/weekend:** https://stackoverflow.com/q/9847213/12846804. **Holiday:** https://stackoverflow.com/q/2394235/12846804. You're escaping the duplicate flag because you're sort of asking 2 questions in one :) – OCa Aug 23 '23 at 14:37
  • Minimal reproducible example: Share something like df.head().to_dict() for specific solutions. – OCa Aug 23 '23 at 14:48
  • @not_speshal Tried that and got value error – user22435779 Aug 24 '23 at 15:09
  • @OCa I've been there. I already have day of the week identified and The USFederalHolidayCalendar() that I imported into Python lists 10 Federal holidays and I only need the 6, it would require an additional step to pick and choose which holidays I need for this purpose. – user22435779 Aug 24 '23 at 15:23

4 Answers4

0

I'm not sure what you're dataframe looks like, but try using assign

As an example on a dummy data (similar to what you want would be):

def temp_func(x,y):
    return x**2 + y**2

df = pd.DataFrame({'arr1': np.random.rand(10), 'arr2': np.random.rand(10)})
df = df.assign( arr3= lambda r: temp_func(r['arr1'], r['arr2']))
Suraj Shourie
  • 536
  • 2
  • 11
0

You can use panda's where function like so:

import pandas as pd
import datetime as dt

# construct a dummy dataframe
LoadData = pd.DataFrame(columns = ['Date', 'DoW'])
for num, date in enumerate(pd.date_range('20130101', '20130108')):
    LoadData.loc[num] = [date, date.strftime('%A')]

holiday_list = ['2013-01-01', '2013-05-27', '2013-07-04', '2013-09-02', '2013-11-28', '2013-12-25']
LoadData['DayType'] = 'WE'
LoadData['DayType'] = LoadData['DayType'].where(LoadData['DoW'].isin(['Saturday', 'Sunday']), 'WD')
LoadData['DayType'] = LoadData['DayType'].where(~LoadData['Date'].isin(holiday_list), 'H')

You will need to repeatedly assign values to DayType using this method but I think it's worth a try

delala
  • 1
  • 1
0

Alternative option:

LoadData = pd.DataFrame(columns = ['Date', 'DoW'])
for num, date in enumerate(pd.date_range('20130101', '20130108')):
    LoadData.loc[num] = [date, date.strftime('%A')]
holiday_list = ['2013-01-01', '2013-05-27', '2013-07-04', '2013-09-02', '2013-11-28', '2013-12-25']
conds = [LoadData.Date.isin(holiday_list), LoadData.DoW.isin(['Saturday', 'Sunday'])]
choice = ['H', 'WD']
LoadData['flag'] = np.select(conds, choice, default = 'WE')
print(LoadData)

 Date        DoW flag
0 2013-01-01    Tuesday    H
1 2013-01-02  Wednesday   WE
2 2013-01-03   Thursday   WE
3 2013-01-04     Friday   WE
4 2013-01-05   Saturday   WD
5 2013-01-06     Sunday   WD
6 2013-01-07     Monday   WE
7 2013-01-08    Tuesday   WE
ragas
  • 848
  • 2
  • 7
0

The way I like to approach this is to use a function that takes a row (which is a pandas series) of the dataframe as the single input, and then apply that. This is how you use lambda to create such a function out of the one you already have:

LoadData['DayType'] = LoadData.apply(lambda row: day_type(row['Date'], row['DoW']), 1)

That second parameter axis = 1 for .apply tells it to apply the function to each row, the default behavior (axis = 0) is columns.

Rather than using lambda, you could also re-write your day_type function to handle this itself:

def day_type(row):
    day_month = row['Date']
    day_week = row['DoW']
    if (day_month == '2013-01-01' or day_month == '2013-05-27' or day_month == '2013-07-04' \
        or day_month == '2013-09-02' or day_month == '2013-11-28' or day_month == '2013-12-25'):
        return "H"
    elif day_week == "Saturday" or day_week =="Sunday":
        return "WE"
    else:
        return "WD"

Then

LoadData['DayType'] = LoadData.apply(day_type, 1)
Jacob H
  • 345
  • 1
  • 11
  • This is the function I was looking for, however when I ran this code the new column only retuned, WE and WD, but not H for holiday – user22435779 Aug 24 '23 at 15:03