2

I have created a function that generates the next/previous n trading days but is too slow for my purposes. Can anyone suggest a way to speed up the function?

def next_trading_day(start_day, num_trading_days, direction):
    '''returns the next/previous trading day. Business_days determines how many days
    back or into the future, direction determines whether back (-1) or forward (1)'''
    for i in range(0, num_trading_days, direction):
        next_day = start_day +datetime.timedelta(days=direction)
        while next_day.weekday() in [5,6] or next_day not in mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10'):
            next_day +=  datetime.timedelta(days=direction)
        start_day = next_day
    return start_day

This is how I use the function:

import pandas as pd
dict1 = [
        {'date': '2016-11-27'},
        {'date': '2016-11-28'},
{'date': '2016-11-27'},
]
df1= pd.DataFrame(dict1)
df1['date']      = pd.to_datetime(df1['date'])

df['Date-1']=df['date'].dt.date.apply(next_business_day, args=[-1,-1,])
Tartaglia
  • 949
  • 14
  • 20
  • 1
    so this returns 2016-11-25 for each date. What is the function doing exactly, maybe it can be refactored. it seems like a lot of computing time to get that day which is a monday. – oppressionslayer Dec 18 '19 at 05:37
  • Made a mistake in there somewhere, gonna fix that quick. It is supposed to return the previous trading day in this instance. – Tartaglia Dec 18 '19 at 06:02
  • A very efficient way to do this would be to create a reference DataFrame of business dates excluding the Holidays, should be simple with calendars and pandas.bdate_range. Then you would use `pd.merge_asof` to bring the closest date in the future or past depending upon the direciton you specify and use `allow_exact_matches=False`: https://stackoverflow.com/questions/58330699/pandas-pl-rollup-to-the-next-business-day/58330956#58330956 is a solution to a similar problem – ALollz Dec 18 '19 at 06:06
  • The function works correctly actually, those days are all weekends that's why it so happens that the previous trading day for all of them happens to be 11/25/2016 – Tartaglia Dec 18 '19 at 06:12
  • Thanks ALollz. However, I would like to be able to go n days in both directions, not just 1 day. Your approach seems to work only for the next day, right? – Tartaglia Dec 18 '19 at 06:19

3 Answers3

1

This check next_day not in mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10') is very time-consuming since it will need to look-up from an array of over 7000 days. And you need to do this for every single operation, thus I think this is the main source of inefficiency.

You can speed up this check by convert mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10') into set, which will lower the look-up from O(N) downto O(log N).

But I would choose another strategy:

  1. Create a table that matches each trading day to its next and/or last trading day
  2. Merge the above table with the dates from your data
  3. Impute missing values
  4. Merge the newly-created table with the original data

Edited: allowing for an arbitrary number of lags and leads

import pandas as pd
import pandas_market_calendars as mcal

def get_next_trading_day(df1, n):
    trading_days = pd.DataFrame({"date": mcal.get_calendar('NYSE').valid_days(start_date='2016-11-10', end_date='2016-12-01')})
    trading_days['date'] = trading_days['date'].dt.tz_convert(None)
    trading_days = trading_days[~trading_days.date.dt.weekday.isin([5,6])]
    trading_days['next_trading_day'] = trading_days.date.shift(-n)
    # extract unique  date from df1
    df2 = pd.DataFrame({"date": pd.unique(df1['date'])})

    # merge with the trading days data (non-trading day will have NA fields)
    df2 = df2.merge(trading_days, on='date', how='outer')

    # impute NA values
    df2.sort_values(by='date', inplace=True)

    df2['next_trading_day'].fillna(method=  'ffill' if n>0 else 'bfill', inplace=True)

    return df1.merge(df2, on='date', how='left')

dict1 = [
        {'date': '2016-11-27'},
        {'date': '2016-11-28'},
        {'date': '2016-11-27'},
        ]
df1= pd.DataFrame(dict1)
df1['date']      = pd.to_datetime(df1['date'])


print("Next trading day")
print(get_next_trading_day(df1, 1))
print()

print("Previous trading day")
print(get_next_trading_day(df1, -1))
print()

print("Next next trading day")
print(get_next_trading_day(df1, 2))
print()

print("Previous previous trading day")
print(get_next_trading_day(df1, -2))
print()

Output

Next trading day
        date next_trading_day
0 2016-11-27       2016-11-28
1 2016-11-28       2016-11-29
2 2016-11-27       2016-11-28

Previous trading day
        date next_trading_day
0 2016-11-27       2016-11-25
1 2016-11-28       2016-11-25
2 2016-11-27       2016-11-25

Next next trading day
        date next_trading_day
0 2016-11-27       2016-11-29
1 2016-11-28       2016-11-30
2 2016-11-27       2016-11-29

Previous previous trading day
        date next_trading_day
0 2016-11-27       2016-11-23
1 2016-11-28       2016-11-23
2 2016-11-27       2016-11-23
9mat
  • 1,194
  • 9
  • 13
  • Thank you for this suggestion. Would this also work if I wanted to go back/forward n days instead of just 1 day? – Tartaglia Dec 18 '19 at 06:42
  • I should have specified that, I have updated my original post to n days – Tartaglia Dec 18 '19 at 07:14
  • Yes, you just need to change the number of days to the `shift` function, `shift(n)` will give you the `n`-day lag, and `shift(-n)` will give you the `n`-day forward: `trading_days['last_trading_day'] = trading_days["date"].shift(n)` – 9mat Dec 18 '19 at 07:15
  • Also, could I just convert the calendar from trading days to holidays instead, since that is a shorter list?? Unfortunately I do not know how to do that? – Tartaglia Dec 18 '19 at 07:17
  • I have modified the code to allow to go back/forward by an arbitrary number of days – 9mat Dec 18 '19 at 07:45
  • This is very nice and elegant, highly appreciate it!! Also, I followed your advice and turned it into a set, and then also declared it outside of the function instead of doing it inside and the increase the speed a lot as well. Thank you so much!! – Tartaglia Dec 18 '19 at 16:02
1

We can make use of pd.tseries.offsets.CustomBusinessDay, which does the addition/subtraction without much issue. You will get a PerformanceWarning though.

import pandas_market_calendars as mcal
import pandas as pd

dict1 = [{'date': '2016-11-27'},
         {'date': '2016-11-28'},
         {'date': '2016-11-23'}]  # Changed this row from your example
df1= pd.DataFrame(dict1)

Find the previous Business day

cal = mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10')
df1['date'] - pd.tseries.offsets.CustomBusinessDay(n=1, calendar=cal) 
#0   2016-11-25
#1   2016-11-25
#2   2016-11-22
#Name: date, dtype: datetime64[ns]

If you want a simple function:

def next_bus_day(s, N, direction, cal):
    """
    direction +1 if future, -1 if past.
    N=0 will move non-business days to the next business day, but keeps 
        business days unchanged. Can check `if N == 0` and 
        `return s` if that is undesired behavior.
    """
    return s + direction*pd.tseries.offsets.CustomBusinessDay(n=N, calendar=cal) 

next_bus_day(df1['date'], N=12, direction=1,
             cal=mcal.get_calendar('NYSE').valid_days(start_date='2000-12-20', end_date='2020-01-10'))
#0   2016-12-13
#1   2016-12-14
#2   2016-12-09
#Name: date, dtype: datetime64[ns]
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • 1
    This is exactly the solution I was hoping for, I knew there was a solution to do this with CustomBusinessDay. Thank you so much!! – Tartaglia Dec 18 '19 at 18:44
1

Like Tartaglia, I also needed a way to do it via CustomBusinessDay. However the solution given by ALollz didn't work for me; Pandas does not throw an error, but using the mcal.get_calendar('NYSE').valid_days() method ends up returning just regular business days, i.e. weekdays, which is what CustomBusinessDay returns by default.

I finally found a solution here, by the creator of pandas_market_calendar (alias mcal). Here's my simple implementation, no function needed:

import pandas_market_calendars as mcal
import pandas as pd

NYSE = mcal.get_calendar('NYSE')
CBD = NYSE.holidays()                     # returns a CustomBusinessDay object

dict1 = [
    {'date': '2021-01-01'},
    {'date': '2020-07-02'},
    {'date': '2020-12-28'},
]
df1 = pd.DataFrame(dict1)
df1['date'] = pd.to_datetime(df1['date'])

df1['date-1'] = df1['date'] - CBD

df1['date+3'] = df1['date'] + 3*CBD

Note that this doesn't have a vectorized implementation in Pandas, but is still pretty fast. Hopefully this helps!