I have a dataframe looking containing strings with date ranges, looking something like this:
winter easter pentecost summer
1 01.02. - 06.02. 31.03. - 10.04. 14.05.+25.05. 07.07. - 21.08.
now I want to generate a list of all dates that are within those ranges. Is there a more pythonic solution than doing the following for each row:
def add_years(d, years):
"""
credits: https://stackoverflow.com/a/15743908/12934163
"""
try:
return d.replace(year = d.year + years)
except ValueError:
return d + (date(d.year + years, 1, 1) - date(d.year, 1, 1))
holidays_list = []
for col in holidays.columns:
if holidays[col].str.contains('\+', na=True).values[0]:
days_list = holidays[col].values[0].split('+')
date_strings = [s + '2010' for s in days_list]
holidays_list.extend([datetime.strptime(date, "%d.%m.%Y").date() for date in date_strings])
else:
days_list = holidays[col].str.split('-',1).tolist()
days_list = [x.strip(' ') for x in days_list[0]]
date_strings = [s + '2010' for s in days_list]
date_dates = [datetime.strptime(date, "%d.%m.%Y").date() for date in date_strings]
if date_dates[0] > date_dates[1]:
date_dates[1] = add_years(date_dates[1],1)
dates_between = list(pd.date_range(date_dates[0],date_dates[1],freq='d'))
ferien_liste.extend(dates_between)
and appending the values of each column to one list? As you can see, some columns contain a +
instead of a -
, meaning that its not a range but rather two single days. Also, sometimes the ranges are over more than one year, say 23.12. - 01.01