How to drop_duplicates but remain a specified value in pandas dataframe?

Question

                 date  price_bl  price_ss  price_bs  price_sl
0 2022-03-09 03:00:00   41198.5       NaN       NaN       NaN
0 2022-03-10 01:00:00       NaN       NaN       NaN   40931.0
0 2022-03-10 01:00:00       NaN       NaN   40931.0       NaN
1 2022-03-16 02:00:00   40867.8       NaN       NaN       NaN
0 2022-03-16 02:00:00       NaN   40867.8       NaN       NaN

# after drop_duplicates...

                 date  price_bl  price_ss  price_bs  price_sl
0 2022-03-09 03:00:00   41198.5       NaN       NaN       NaN
0 2022-03-10 01:00:00       NaN       NaN       NaN   40931.0
1 2022-03-16 02:00:00   40867.8       NaN       NaN       NaN
1 2022-03-16 03:00:00       NaN       NaN       NaN   39137.9
1 2022-03-16 08:00:00       NaN   40289.0       NaN       NaN

As you can see, the price_bs at 2022-03-10 01:00:00 has two value: 40931.0 and NaN, and after drop_duplicates, it's become one value, NaN. But I want it remain the normal value but not the NaN value (if it has a normal value and a NaN value). What should I do (I mean not only for the price_bs column, but also the other 3 price_* columns)?

Seems like every row has only one price. Maybe you ought to arrange it in a single price column and one column indicating whether it was bl, ss, bs, or sl — Andrew Allaire, Mar 21 '23 at 19:03
Have you considered whether some coalesce strategy can help here? (links: [1](https://stackoverflow.com/q/38152389), [2](https://www.statology.org/pandas-coalesce/), [3](https://stackoverflow.com/q/43177685)) — Savir, Mar 21 '23 at 19:09
Should some of your duplicate dates be represented as unique moments? Perhaps by adding another index level akin to a,b,c... for a single timestamp. Otherwise It's not clear why you have two duplicate dates but still want them to be represented separately. — nlivingstone, Mar 21 '23 at 19:09

score 3 · Accepted Answer · answered Mar 21 '23 at 19:04

3

You can try combining rows via groupby and aggregation function like max or first:

df = df.groupby('date', as_index=False).max()
# or 
df = df.groupby('date', as_index=False).first() # will use first non-null value in the column for group

answered Mar 21 '23 at 19:04

Guru Stron

102,774
10
95
132

How to drop_duplicates but remain a specified value in pandas dataframe?

1 Answers1