1

I'm trying to get dummy variables for holidays in a dataset. I have a couple of dateranges (pd.daterange()) with holidays and a dataframe to which I would like to append a dummy to indicate whether the datetime of that row is in a certain daterange of the specified holidays.

Small example:

ChristmasBreak = list(pd.date_range('2014-12-20','2015-01-04').date)

dates = pd.date_range('2015-01-03', '2015-01-06, freq='H')
d = {'Date': dates, 'Number': np.rand(len(dates))}

df = pd.DataFrame(data=d)
df.set_index('Date', inplace=True)

for i, row in df.iterrows():
    if i in ChristmasBreak:
        df[i,'Christmas] = 1

The if loop is never entered, so matching the dates won't work. Is there any way to do this? Alternative methods to come to dummies for this case are welcome as well!

Jeroen
  • 857
  • 10
  • 18

2 Answers2

2

First dont use iterrows, because really slow.

Better is use dt.date with Series,isin, last convert boolean mask to integer - Trues are 1:

df = pd.DataFrame(data=d)

df['Christmas'] = df['Date'].dt.date.isin(ChristmasBreak).astype(int)

Or use between:

df['Christmas'] = df['Date'].between('2014-12-20', '2015-01-04').astype(int)

If want compare with DatetimeIndex:

df = pd.DataFrame(data=d)
df.set_index('Date', inplace=True)

df['Christmas'] = df.index.date.isin(ChristmasBreak).astype(int)

df['Christmas'] = ((df.index > '2014-12-20') & (df.index < '2015-01-04')).astype(int)

Sample:

ChristmasBreak = pd.date_range('2014-12-20','2015-01-04').date

dates = pd.date_range('2014-12-19 20:00', '2014-12-20 05:00', freq='H')
d = {'Date': dates, 'Number': np.random.randint(10, size=len(dates))}

df = pd.DataFrame(data=d)

df['Christmas'] = df['Date'].dt.date.isin(ChristmasBreak).astype(int)
print (df)
                 Date  Number  Christmas
0 2014-12-19 20:00:00       6          0
1 2014-12-19 21:00:00       7          0
2 2014-12-19 22:00:00       0          0
3 2014-12-19 23:00:00       9          0
4 2014-12-20 00:00:00       1          1
5 2014-12-20 01:00:00       3          1
6 2014-12-20 02:00:00       1          1
7 2014-12-20 03:00:00       8          1
8 2014-12-20 04:00:00       2          1
9 2014-12-20 05:00:00       1          1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

This should do what you want:

df['Christmas'] = df.index.isin(ChristmasBreak).astype(int)
zipa
  • 27,316
  • 6
  • 40
  • 58