0

Assuming I have days and symbols I would like to trade (inside of a DF):

   level_0  index        date symbol  ...  change_1_day  change_10_day  volume_10_day  volume_1_day
0    22177  22177  2022-12-20   ICCM  ...    177.599829           None           None       30005.0
1    30404  30404  2022-12-22    APE  ...     75.182482           None           None        2224.0
2    46210  46210  2022-12-21   SINT  ...     57.161981           None           None      857345.0
3    47737  47737  2022-12-23   XPON  ...    139.185751           None           None         284.0

[4 rows x 15 columns]

And then a huge DF of minute data, that contains many days and symbols:

       index                  timestamp    open    high  ...     EMA89    EMA216    EMA267    EMA200
0          0  2022-09-22 08:00:00+00:00  3.8400  3.9700  ...       NaN       NaN       NaN       NaN
1          1  2022-09-22 08:05:00+00:00  3.9100  3.9600  ...       NaN       NaN       NaN       NaN
2          2  2022-09-22 08:10:00+00:00  3.9300  3.9500  ...       NaN       NaN       NaN       NaN
3          3  2022-09-22 08:15:00+00:00  3.9300  3.9500  ...       NaN       NaN       NaN       NaN
4          4  2022-09-22 08:20:00+00:00  3.9500  3.9800  ...       NaN       NaN       NaN       NaN
...      ...                        ...     ...     ...  ...       ...       ...       ...       ...
21288  21288  2022-12-24 00:35:00+00:00  2.2400  2.2400  ...  2.301111  2.256522  2.195352  2.273983
21289  21289  2022-12-24 00:40:00+00:00  2.2488  2.2488  ...  2.299948  2.256451  2.195750  2.273732
21290  21290  2022-12-24 00:45:00+00:00  2.2500  2.2500  ...  2.298172  2.256115  2.195931  2.273198
21291  21291  2022-12-24 00:50:00+00:00  2.2500  2.2500  ...  2.296879  2.255967  2.196260  2.272867
21292  21292  2022-12-24 00:55:00+00:00  2.2395  2.2395  ...  2.294059  2.255174  2.196064  2.271844

[21293 rows x 17 columns]

How can I filter the minute dataframe so that it only contains the minute OHLC data for the days and symbols in the daily dataframe?

i.e I want to remove all rows inside of the minute data which do not match a symbol and date inside the daily DF.

How can I achieve this?

UPDATE

I have tried what Chris suggested in the comments, but it doesn't change the dataframe:

df.merge(df_daily[['symbol','date']], on=['symbol','date'], how='inner')

It still returns the same number of rows:

[21293 rows x 17 columns]

UPDATE 2

Does the same:

df3 = df_daily.merge(df, on=['symbol','date'], how='inner')

UPDATE 3

Following the approved answer here I have tried:

keys = list(df_daily.columns.values)
i1 = df.set_index(keys).index
i2 = df_daily.set_index(keys).index
df[~i1.isin(i2)]

Which returns the error:

    raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['level_0', 'previous_close', 'change_1_day', 'change_10_day', 'volume_1_day'] are in the columns"
a7dc
  • 3,323
  • 7
  • 32
  • 50

0 Answers0