Assuming I have days and symbols I would like to trade (inside of a DF):
level_0 index date symbol ... change_1_day change_10_day volume_10_day volume_1_day
0 22177 22177 2022-12-20 ICCM ... 177.599829 None None 30005.0
1 30404 30404 2022-12-22 APE ... 75.182482 None None 2224.0
2 46210 46210 2022-12-21 SINT ... 57.161981 None None 857345.0
3 47737 47737 2022-12-23 XPON ... 139.185751 None None 284.0
[4 rows x 15 columns]
And then a huge DF of minute data, that contains many days and symbols:
index timestamp open high ... EMA89 EMA216 EMA267 EMA200
0 0 2022-09-22 08:00:00+00:00 3.8400 3.9700 ... NaN NaN NaN NaN
1 1 2022-09-22 08:05:00+00:00 3.9100 3.9600 ... NaN NaN NaN NaN
2 2 2022-09-22 08:10:00+00:00 3.9300 3.9500 ... NaN NaN NaN NaN
3 3 2022-09-22 08:15:00+00:00 3.9300 3.9500 ... NaN NaN NaN NaN
4 4 2022-09-22 08:20:00+00:00 3.9500 3.9800 ... NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ...
21288 21288 2022-12-24 00:35:00+00:00 2.2400 2.2400 ... 2.301111 2.256522 2.195352 2.273983
21289 21289 2022-12-24 00:40:00+00:00 2.2488 2.2488 ... 2.299948 2.256451 2.195750 2.273732
21290 21290 2022-12-24 00:45:00+00:00 2.2500 2.2500 ... 2.298172 2.256115 2.195931 2.273198
21291 21291 2022-12-24 00:50:00+00:00 2.2500 2.2500 ... 2.296879 2.255967 2.196260 2.272867
21292 21292 2022-12-24 00:55:00+00:00 2.2395 2.2395 ... 2.294059 2.255174 2.196064 2.271844
[21293 rows x 17 columns]
How can I filter the minute dataframe so that it only contains the minute OHLC data for the days and symbols in the daily dataframe?
i.e I want to remove all rows inside of the minute data which do not match a symbol and date inside the daily DF.
How can I achieve this?
UPDATE
I have tried what Chris suggested in the comments, but it doesn't change the dataframe:
df.merge(df_daily[['symbol','date']], on=['symbol','date'], how='inner')
It still returns the same number of rows:
[21293 rows x 17 columns]
UPDATE 2
Does the same:
df3 = df_daily.merge(df, on=['symbol','date'], how='inner')
UPDATE 3
Following the approved answer here I have tried:
keys = list(df_daily.columns.values)
i1 = df.set_index(keys).index
i2 = df_daily.set_index(keys).index
df[~i1.isin(i2)]
Which returns the error:
raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['level_0', 'previous_close', 'change_1_day', 'change_10_day', 'volume_1_day'] are in the columns"