This check next_day not in mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10')
is very time-consuming since it will need to look-up from an array of over 7000 days. And you need to do this for every single operation, thus I think this is the main source of inefficiency.
You can speed up this check by convert mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10')
into set, which will lower the look-up from O(N) downto O(log N).
But I would choose another strategy:
- Create a table that matches each trading day to its next and/or last trading day
- Merge the above table with the dates from your data
- Impute missing values
- Merge the newly-created table with the original data
Edited: allowing for an arbitrary number of lags and leads
import pandas as pd
import pandas_market_calendars as mcal
def get_next_trading_day(df1, n):
trading_days = pd.DataFrame({"date": mcal.get_calendar('NYSE').valid_days(start_date='2016-11-10', end_date='2016-12-01')})
trading_days['date'] = trading_days['date'].dt.tz_convert(None)
trading_days = trading_days[~trading_days.date.dt.weekday.isin([5,6])]
trading_days['next_trading_day'] = trading_days.date.shift(-n)
# extract unique date from df1
df2 = pd.DataFrame({"date": pd.unique(df1['date'])})
# merge with the trading days data (non-trading day will have NA fields)
df2 = df2.merge(trading_days, on='date', how='outer')
# impute NA values
df2.sort_values(by='date', inplace=True)
df2['next_trading_day'].fillna(method= 'ffill' if n>0 else 'bfill', inplace=True)
return df1.merge(df2, on='date', how='left')
dict1 = [
{'date': '2016-11-27'},
{'date': '2016-11-28'},
{'date': '2016-11-27'},
]
df1= pd.DataFrame(dict1)
df1['date'] = pd.to_datetime(df1['date'])
print("Next trading day")
print(get_next_trading_day(df1, 1))
print()
print("Previous trading day")
print(get_next_trading_day(df1, -1))
print()
print("Next next trading day")
print(get_next_trading_day(df1, 2))
print()
print("Previous previous trading day")
print(get_next_trading_day(df1, -2))
print()
Output
Next trading day
date next_trading_day
0 2016-11-27 2016-11-28
1 2016-11-28 2016-11-29
2 2016-11-27 2016-11-28
Previous trading day
date next_trading_day
0 2016-11-27 2016-11-25
1 2016-11-28 2016-11-25
2 2016-11-27 2016-11-25
Next next trading day
date next_trading_day
0 2016-11-27 2016-11-29
1 2016-11-28 2016-11-30
2 2016-11-27 2016-11-29
Previous previous trading day
date next_trading_day
0 2016-11-27 2016-11-23
1 2016-11-28 2016-11-23
2 2016-11-27 2016-11-23