Given a test dataset as follows:
id city district quantity price
0 1 bj hd 12.0 23.0
1 2 bj cy 23.0 45.0
2 3 bj hd NaN NaN
3 4 bj cy NaN NaN
4 5 sh hp 12.0 NaN
5 6 sh hp NaN NaN
6 7 sh pd NaN NaN
I would like to drop duplicated rows based on city
and district
, then drop rows if its quantity
is NaN
, but if city
and district
are not duplicated, then even if quantity
is NaN
, it's not necessary to drop rows.
Code based on link from here:
m1 = df['quantity'].notna()
m2 = ~df[['city', 'district']].duplicated()
df1 = df[m1 & m2]
print(df1)
Out:
id city district quantity price
0 1 bj hd 12.0 23.0
1 2 bj cy 23.0 45.0
4 5 sh hp 12.0 NaN
But I want to keep the last row since it's not duplicated with any rows. How could I do that?
id city district quantity price
0 1 bj hd 12.0 23.0
1 2 bj cy 23.0 45.0
2 5 sh hp 12.0 NaN
3 7 sh pd NaN NaN