1

I have a list of dates in a dataframe, and another dataframe containing percentage changes throughout a day.

Sample dataframe with dates (df_date):

df_test = pd.DataFrame({
'Specific_date': {0: '2016-01-10', 1: '2016-01-12', 2: '2016-01-13', 3: '2016-01-19'}})

df_test['Specific_date'] = pd.to_datetime(df_test['Specific_date'])

Percentage change dataframe (df_percent):

Hour        9am     10am    11am    12pm    1pm     2pm     3pm     4pm
Date                                
2016-01-05  20.6475 20.5900 20.4225 20.6275 20.1600 19.6500 19.6250 19.4100
2016-01-06  21.3550 20.8675 20.6100 20.6525 20.8900 21.0125 21.0600 20.5125
2016-01-07  23.0075 22.7975 23.0050 23.5975 24.4675 25.2450 25.1600 24.9575
2016-01-08  22.9125 23.2400 23.8575 23.9475 24.0425 24.4000 25.7950 26.7625
2016-01-11  25.7500 25.9100 25.8800 25.9325 26.7650 26.4025 24.9425 24.2725
2016-01-12  22.5500 22.6900 23.2700 23.2550 23.1425 22.8175 22.2925 22.4175
2016-01-13  21.8175 22.6200 22.5225 23.2675 23.9650 25.0500 24.9575 25.1100
2016-01-14  25.4600 25.0050 24.2875 24.2050 24.2850 23.7800 23.6775 23.9575
2016-01-15  28.3200 28.5925 27.8400 28.8900 29.2925 28.4225 27.6525 27.1525
2016-01-19  26.1625 26.3400 26.0725 26.2550 26.3275 26.9225 26.5725 26.0075

I am trying to use those dates in df_test to filter out dates in the sample dataframe (data runs from 2016 to 2020 daily).

Logic: I want to get the value at the date (T=0) in df_date, along with the values at dates 3, 2, and 1 day before (T-3, T-2, T-1), and 1, 2, and 3, days after. (T+1, T+2, T+3), append them to a new dataframe, and check for the next date in the dictionary to append associated values to those dataframes.

I had thought about trying to create a new dataframe for each T, and assuming I would do that, the pseudo-logic-english-code-whatever would look like this.

Tm3, Tm2, Tm1, T0, Tp1, Tp2, Tp3 = pd.DataFrame()
for date in df_percent['Date']:
   if df_percent['Date'] is in df_date['Specific_date']:
       Tm3 = df_percent['Date'] - BDay(3)
       Tm2 = df_percent['Date'] - BDay(2)
       Tm1 = df_percent['Date'] - BDay(1)
       T0 = df_percent['Date']
       Tp1 = df_percent['Date'] + BDay(1)
       Tp2 = df_percent['Date'] + BDay(2)
       Tp3 = df_percent['Date'] + BDay(3)

I don't believe this is the right approach, or I'm using the wrong logic, but I cannot yield anything productive from what I have right now with is a Frankenstein version of what I have above.

Expected output sample for Tm3 dataframe for elements in df_test:

Hour        9am     10am    11am    12pm    1pm     2pm     3pm     4pm
Date
2016-01-06  21.3550 20.8675 20.6100 20.6525 20.8900 21.0125 21.0600 20.5125
2016-01-07  23.0075 22.7975 23.0050 23.5975 24.4675 25.2450 25.1600 24.9575
2016-01-08  22.9125 23.2400 23.8575 23.9475 24.0425 24.4000 25.7950 26.7625
2016-01-13  21.8175 22.6200 22.5225 23.2675 23.9650 25.0500 24.9575 25.1100

Any help would be appreciated :)

EDITED: Error I was receiving from code implementation

EDIT #2: Second edit of errors

EDIT #3: Dataframe that might be causing issues.

EDIT #4: Dataframe incorrectly displaying

rpanai
  • 12,515
  • 2
  • 42
  • 64
birdman
  • 249
  • 1
  • 13

1 Answers1

1

Your data looks like financial market data. Financial markets are not open every business days (there are trading floor holidays) so you cannot use BDay. Instead, it's better to label every day in your df_percent sequentially, so that Day 0 is 2016-01-05, day 1 is 2016-01-06, etc. This way you can easily reference n trading days before or after.

# Assign a sequential number to each trading day
df_melt_test_percent = df_melt_test_percent.sort_index().assign(DayNumber=lambda x: range(len(x)))

# Find the indices of the FOMC_dates
tmp = pd.merge(
    df_FOMC_dates, df_melt_test_percent[['DayNumber']],
    left_on='FOMC_dates', right_index=True
)

# For each row, get the FOMC_dates ± 3 days
tmp['delta'] = tmp.apply(lambda _: range(-3, 4), axis=1)

tmp = tmp.explode('delta')
tmp['DayNumber'] += tmp['delta']

# Assemble the result
result = pd.merge(tmp, df_melt_test_percent, on='DayNumber')

Result:

FOMC_dates DayNumber delta     9am    10am    11am    12pm     1pm     2pm     3pm     4pm
2016-01-12         2    -3 23.0075 22.7975 23.0050 23.5975 24.4675 25.2450 25.1600 24.9575
2016-01-12         3    -2 22.9125 23.2400 23.8575 23.9475 24.0425 24.4000 25.7950 26.7625
2016-01-13         3    -3 22.9125 23.2400 23.8575 23.9475 24.0425 24.4000 25.7950 26.7625
2016-01-12         4    -1 25.7500 25.9100 25.8800 25.9325 26.7650 26.4025 24.9425 24.2725
2016-01-13         4    -2 25.7500 25.9100 25.8800 25.9325 26.7650 26.4025 24.9425 24.2725
2016-01-12         5     0 22.5500 22.6900 23.2700 23.2550 23.1425 22.8175 22.2925 22.4175
2016-01-13         5    -1 22.5500 22.6900 23.2700 23.2550 23.1425 22.8175 22.2925 22.4175
2016-01-12         6     1 21.8175 22.6200 22.5225 23.2675 23.9650 25.0500 24.9575 25.1100
2016-01-13         6     0 21.8175 22.6200 22.5225 23.2675 23.9650 25.0500 24.9575 25.1100
2016-01-19         6    -3 21.8175 22.6200 22.5225 23.2675 23.9650 25.0500 24.9575 25.1100
2016-01-12         7     2 25.4600 25.0050 24.2875 24.2050 24.2850 23.7800 23.6775 23.9575
2016-01-13         7     1 25.4600 25.0050 24.2875 24.2050 24.2850 23.7800 23.6775 23.9575
2016-01-19         7    -2 25.4600 25.0050 24.2875 24.2050 24.2850 23.7800 23.6775 23.9575
2016-01-12         8     3 28.3200 28.5925 27.8400 28.8900 29.2925 28.4225 27.6525 27.1525
2016-01-13         8     2 28.3200 28.5925 27.8400 28.8900 29.2925 28.4225 27.6525 27.1525
2016-01-19         8    -1 28.3200 28.5925 27.8400 28.8900 29.2925 28.4225 27.6525 27.1525
2016-01-13         9     3 26.1625 26.3400 26.0725 26.2550 26.3275 26.9225 26.5725 26.0075
2016-01-19         9     0 26.1625 26.3400 26.0725 26.2550 26.3275 26.9225 26.5725 26.0075

Rows with delta = 0 are your original FOMC_dates. You can drop the columns you don't want and pivot it to your preferred shape.

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • Hi, first off, thank you so much for the explanation! I'm having a tough time trying to implement it, though, and receiving an error of: `ValueError: cannot insert level_0, already exists` from the `reset_index()` function. I tried `drop = True` and `droplevel(0)` but it didn't change any output. – birdman Aug 27 '21 at 03:06
  • Reinitalize the percentage data frame and change to `reset_index(drop=True)` – Code Different Aug 27 '21 at 03:12
  • Error posted in OP – birdman Aug 27 '21 at 03:20
  • I might have found the problem. Date columns in `df_percent` are not actual column names since they're cast as `pd.to_datetime` – birdman Aug 27 '21 at 03:21
  • Posted a sample of the dataframe. Not sure how to convert those columns back, though, since they're just labels. – birdman Aug 27 '21 at 03:26
  • I’m charging you extra rep points for making me get out of bed. But now that I’m seeing the real data frame, I’ll update my code – Code Different Aug 27 '21 at 03:28
  • Edited my answer and also make it clearer. Reinitialize `df_melt_test_percent` – Code Different Aug 27 '21 at 03:49
  • Maybe the best explanation I've ever received for code help; sincerely, thank you. – birdman Aug 27 '21 at 04:12
  • I hate to bother you again since you were so nice about it last night, but now I'm having a problem with displaying the dataframe. I have edited the OP – birdman Aug 27 '21 at 13:18
  • Actually it's a bug in the code. I've updated the answer – Code Different Aug 27 '21 at 13:29
  • Receiving error of `ValueError: You are trying to merge on object and int32 columns. If you wish to proceed you should use pd.concat` from `tmp = pd.merge` line. – birdman Aug 27 '21 at 13:44
  • Hm... I'm turning this into a mess. Can you make a new question and I will try to help you there – Code Different Aug 27 '21 at 13:48
  • https://stackoverflow.com/questions/68954776/trying-to-merge-2-dataframes-but-receiving-value-error-of-merging-object-and-int Link to new thread. – birdman Aug 27 '21 at 14:04