A) You can use date_range to create column with dates
df = pd.DataFrame()
df['Date'] = pd.date_range(start_date, end_date)
Next you can create column Holiday
with zeros in all cells
df['Holiday'] = 0
And next you can replace some values
for item in holiday_list:
item = datetime.datetime.strptime(item, '%Y-%m-%d')
df['Holiday'][ df['Date'] == item ] = 1
but maybe this part could be simpler using isin()
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Holiday'][mask] = 1
or using numpy.where()
import numpy as np
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Holiday'] = np.where(mask, 1, 0)
or simply keep it as True/False
instead of 1/0
df['Holiday'] = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
import pandas as pd
import datetime
holiday_list = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']
start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020,month=1, day=28)
df = pd.DataFrame()
df['Date'] = pd.date_range(start_date, end_date)
df['Holiday'] = 0
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Holiday'][mask] = 1
print(df)
B) you could use [start:start+size]
to split list
numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
size = len(numbers)//3
print(d[size*0:size*1], d[size*1:size*2], d[size*2:size*3])
or
print(d[:size], d[size:size*2], d[size*2:])
Similar way you can split dataframe
(after filtered "Holiday") to work with 8 days [start:star+8]
but I wil use it in (C)
C) You can create column Values
with NW
in all cells
df['Values'] = 'NW'
Next you can use previous mask to assign "Holiday"
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Values'][ mask ] = 'Holiday'
Using ~
you can negate mask to reverse selection - to select cells withou "Holiday"
selected = df['Values'][ ~mask ]
and now I can try to assing
for a, b in zip(range(0, len(selected), 8), range(0, len(numbers), size)):
selected[a:a+size] = numbers[b:b+size]
df['Values'][ ~mask ] = selected
but maybe it can be done in simpler way. Maybe with groupby()
or rolling()
?
import pandas as pd
import datetime
holiday_list = ['2020-01-01','2020-01-05','2020-01-12','2020-01-19','2020-01-26']
start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020,month=1, day=28)
df = pd.DataFrame()
# ---
df['Date'] = pd.date_range(start_date, end_date)
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Holiday'] = 0
df['Holiday'][mask] = 1
# ---
df['Values'] = 'NW'
df['Values'][ mask ] = 'Holiday'
numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
size = len(numbers)//3
selected = df['Values'][ ~mask ]
for a, b in zip(range(0, len(selected), 8), range(0, len(numbers), size)):
selected[a:a+size] = numbers[b:b+size]
df['Values'][ ~mask ] = selected
print(df)
EDIT:
I created this code.
Main problem was it sometimes create copy of data and it change values in this copy but not in original dataframe - so I use masks instead of slicings.
It may display warning that it changes values in copy of data (not in original dataframe) but finally it gives me correct result.
Maybe using information from Returning a view versus a cop it could remove this warning
import pandas as pd
import datetime
holiday_list = [
'2020-01-01','2020-01-05',
#'2020-01-10','2020-01-11', # add more to test when there is less then 7 NW
'2020-01-12','2020-01-19','2020-01-26'
]
start_date = datetime.datetime(year=2020, month=1, day=1)
end_date = datetime.datetime(year=2020,month=1, day=28)
df = pd.DataFrame()
# ---
df['Date'] = pd.date_range(start_date, end_date)
mask = df['Date'].dt.strftime('%Y-%m-%d').isin(holiday_list)
df['Holiday'] = 0
df['Holiday'][mask] = 1
# ---
df['Values'] = 'NW'
df['Values'][ mask ] = 'Holiday'
numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
size = len(numbers)//3
start = 0
for b in range(0, len(numbers), size):
# find first and last NW to replace (needs `start` to keep few NW at the end of previous 8 days gap)
mask = (df['Values'] == 'NW') & (df.index >= start)
# change size if there is less then 7 `NW`
print('NW:', sum(mask)) # sum() counts all `True` in mask
if sum(mask) <= size:
left = size - sum(mask)
size = sum(mask)
print('shorter:', size, left)
# first and last NW to replace
start = df[ mask ].index[0]
end = df[ mask ].index[size-1]
print('start, end:', start, end)
# use new mask to select and replace values
# (using slicing [0:6] doesn't work beacuse it create copy of data
# and it doesn't replace in original dataframe)
mask = mask & (df.index >= start) & (df.index <= end)
df['Values'][ mask ] = numbers[b:b+size]
# create gap 8days
start += 8+1
print(df)