0

If I have a dataframe of daily data which contain symbols and different dates:

     level_0   index        date symbol    open  ...  volume_10_day  is_downtrending  is_downtrending_lookback  consolidating_10  consolidating_10_lookback
0       3608    3608  2022-10-26   CIFR  0.8600  ...         3883.2                0                         0                 0                          1
1      11367   11367  2022-09-12   CLVS  1.2800  ...        24749.8                0                         0                 0                          1
2      13031   13031  2022-10-06    CGC  3.0700  ...      3807474.9                0                         0                 0                          1
3      13044   13044  2022-10-25    CGC  2.4000  ...      4213340.1                0                         0                 0                          1
4      13864   13864  2022-09-02   CMCM  4.9100  ...         3560.0                0                         0                 0                          1
..       ...     ...         ...    ...     ...  ...            ...              ...                       ...               ...                        ...
353   684622  684622  2022-10-24   SOBR  3.2500  ...        65830.2                0                         0                 0                          1
354   685045  685045  2022-08-29   SNTG  2.6500  ...        12765.3                0                         1                 0                          1
355   685093  685093  2022-11-04   SNTG  4.6889  ...     17969582.7                0                         0                 0                          0
356   686851  686851  2022-10-11    WNW  0.8700  ...         5172.1                0                         0                 0                          1
357   688103  688103  2022-10-11    BHG  0.8750  ...         1489.5                0                         1                 0                          1

[358 rows x 18 columns]

Sometimes, there are multiplies of the same days but with different symbols. For example, on 2022-10-11 there are two symbols which occur: WNW, BHG.

356   686851  686851  2022-10-11    WNW  0.8700  ...         5172.1                0                         0                 0                          1
357   688103  688103  2022-10-11    BHG  0.8750  ...         1489.5                0                         1                 0                          1

When this happens, I only want the first instance to be returned (all other symbols occurring on the same day should be removed), something like:

     level_0   index        date symbol    open  ...  volume_10_day  is_downtrending  is_downtrending_lookback  consolidating_10  consolidating_10_lookback
0       3608    3608  2022-10-26   CIFR  0.8600  ...         3883.2                0                         0                 0                          1
1      11367   11367  2022-09-12   CLVS  1.2800  ...        24749.8                0                         0                 0                          1
2      13031   13031  2022-10-06    CGC  3.0700  ...      3807474.9                0                         0                 0                          1
3      13044   13044  2022-10-25    CGC  2.4000  ...      4213340.1                0                         0                 0                          1
4      13864   13864  2022-09-02   CMCM  4.9100  ...         3560.0                0                         0                 0                          1
..       ...     ...         ...    ...     ...  ...            ...              ...                       ...               ...                        ...
353   684622  684622  2022-10-24   SOBR  3.2500  ...        65830.2                0                         0                 0                          1
354   685045  685045  2022-08-29   SNTG  2.6500  ...        12765.3                0                         1                 0                          1
355   685093  685093  2022-11-04   SNTG  4.6889  ...     17969582.7                0                         0                 0                          0
356   686851  686851  2022-10-11    WNW  0.8700  ...         5172.1                0                         0                 0                          1

[357 rows x 18 columns]

Where in the duplicate of WNW, BHG, only the first one (WNW) is returned.

How can I do this? Something like:

df_filtered.drop_duplicates(subset=['date', 'symbol'], inplace=True)

Any help is much appreciated

a7dc
  • 3,323
  • 7
  • 32
  • 50
  • 1
    So the symbol (WNW or BHG) is actually irrelevant? Any you just want to keep the first row of any date? That would be `df_filtered.drop_duplicates(subset=['date'], keep='first', inplace=True)`. – Frodnar Nov 24 '22 at 04:09
  • 1
    Does this answer your question? [Drop all duplicate rows across multiple columns in Python Pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-across-multiple-columns-in-python-pandas) – Frodnar Nov 24 '22 at 04:12
  • No sadly it doesn't – a7dc Nov 24 '22 at 04:43
  • Sorry I didn't see the first comment - yes that works. Was getting confused. Thanks! – a7dc Nov 24 '22 at 17:24

1 Answers1

1

Per the discussion in the comments, this solution works:

df_filtered.drop_duplicates(subset=['date'], keep='first', inplace=True)
Frodnar
  • 2,129
  • 2
  • 6
  • 20