1

I saw the question being answered for a single date but not a set of different dates: I would like to create a column that counts the number of days elapsed since the last occurence of an event in Pandas. I have a dictionary containing similar dataframes with the following structure:

                Vol  Vol_lag  Fed_meeting
Date                                     
2005-06-02  72.9000  72.5000          0.0
2005-06-10  78.3000  72.9000          0.0
2005-06-16  76.0500  78.3000          0.0
2005-06-17  73.0500  76.0500          0.0
2005-06-24  75.7000  73.0500          0.0
...             ...      ...          ...
2022-01-03  80.3288  77.8832          0.0
2022-01-04  83.1597  80.3288          0.0
2022-01-05  80.5131  83.1597          0.0

This is obtained by iterating through the data frames in my dictionary, like so:

df = pd.read_excel(file, sheet_name=None, index_col="Date", parse_dates=True)
fed_df = pd.read_excel(fed_file, index_col="Date", parse_dates=True)

for key in df:
    df[key]["Vol_lag"] = df[key]["Vol"].shift(1)
    df[key] = pd.merge(df[key], fed_df, how='outer', left_index=True, right_index=True)
    df[key].fillna(0, inplace=True)

"Fed_meeting" is a column that contains a 1 if there's a Fed meeting on the day, 0 if not. I'd like to add a column "Days_elapsed" in each of the dataframes that counts the number of days elapsed since Fed_meeting was last equal to 1 (ie equal to 0 if today is Fed day, 1 if the meeting was yesterday, and so on). My data is imported such that the index of the dataframe already has a datetime format.

Edited to add: Unfortunately the interval between dates is not regular (sometimes there's a 1 week gap between data points, but sometimes the gap is daily), so the code has to be based on the actual days elapsed and not just the number of data points between the two meetings.

Edit 2: the Fed_meeting column is already the product of a merge between my original dfs and a fed_df containing only Fed meeting dates.

Thanks a lot!

  • Does this answer your question? [Getting days since last occurence in Pandas DataFrame?](https://stackoverflow.com/questions/44420846/getting-days-since-last-occurence-in-pandas-dataframe) – Matthew Borish Jan 10 '22 at 18:01

1 Answers1

0

You can use an asof merge to bring the closest date when there was a Fed meeting (in the past), and then manually calculate the day difference between those dates. An asof merge guarantees the result is the same length as the left DataFrame.

Starting Data

# So there are some Fed_meetings in the actual data
print(df)

                Vol  Vol_lag  Fed_meeting
Date                                     
2005-06-02  72.9000  72.5000          0.0
2005-06-10  78.3000  72.9000          0.0
2005-06-16  76.0500  78.3000          1.0
2005-06-17  73.0500  76.0500          0.0
2005-06-24  75.7000  73.0500          1.0
2022-01-03  80.3288  77.8832          0.0
2022-01-04  83.1597  80.3288          0.0
2022-01-05  80.5131  83.1597          1.0

Code

import pandas as pd

meetings = df[df['Fed_meeting'].eq(1)].copy()
meetings['Prev_date'] = meetings.index

df = pd.merge_asof(df, meetings['Prev_date'],
                   left_index=True, right_index=True,
                   direction='backward')

df['Date_diff'] = df.index-df['Prev_date']

print(df)
                Vol  Vol_lag  Fed_meeting  Prev_date Date_diff
Date                                                          
2005-06-02  72.9000  72.5000          0.0        NaT       NaT
2005-06-10  78.3000  72.9000          0.0        NaT       NaT
2005-06-16  76.0500  78.3000          1.0 2005-06-16    0 days
2005-06-17  73.0500  76.0500          0.0 2005-06-16    1 days
2005-06-24  75.7000  73.0500          1.0 2005-06-24    0 days
2022-01-03  80.3288  77.8832          0.0 2005-06-24 6037 days
2022-01-04  83.1597  80.3288          0.0 2005-06-24 6038 days
2022-01-05  80.5131  83.1597          1.0 2022-01-05    0 days
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Thanks! I forgot to add that the Fed_meeting column comes from a merge between my original dfs and fed_df containing only Fed meeting dates, and that I am iterating through a dictionary of data frames of the same structure, like so: `for key in df: df[key]["Vol_lag"] = df[key]["Vol"].shift(1) df[key] = pd.merge(df[key], fed_df, how='outer', left_index=True, right_index=True)` Do you think there's a more efficient way of doing this? – Julie Andrieu Jan 10 '22 at 18:30