I have a pandas df like below
In below df, in index 0,1
&
2,3
......& 500,501,502
the duplicate values found in X & Y columns, and again the seconds round started with same duplicate values in X & Y column in index 1000, 1001 & 1002,1003 & ....1200,1201....
it goes on
but with different weights in weight column.
index x y weight
0 59.644 10.72 0.69
1 59.644 10.72 0.82
2 57.822 10.13 0.75
3 57.822 10.13 0.68
4 57.822 10.13 0.20
.
.
500 53.252 10.85 0.15
501 53.252 10.85 0.95
502 53.252 10.85 0.69
.
.
1000 59.644 10.72 0.85
1001 59.644 10.72 0.73
1002 57.822 10.13 0.92
1003 57.822 10.13 0.15
.
.
.
1200 53.252 10.85 0.78
1201 53.252 10.85 1.098
My requirement
I would like to have my df
1) Avoid repeated/duplicate row values in X & Y which has weight value less than 0.602) But still duplicates in X & Y column repeats, So now i want to compare the weight values between duplicate rows & remove the rows which has lesser weight.
3) If I use the below code, it removes all the duplicates between x & y
df_2.groupby(['X', 'Y'], as_index=False,sort=False)['weight'].max()
But I want to compare the first occured duplicates and remove them, then the 2nd, then 3rd and so on ..so that the continuity of duplicate value prevails after some rows. for better understanding, please refer the below required df
How the df should look like:
index x y weight
1 59.644 10.72 0.82
2 57.822 10.13 0.75
.
.
501 53.252 10.85 0.95
.
.
1000 59.644 10.72 0.85
.
1002 57.822 10.13 0.92
.
.
1201 53.252 10.85 1.098
.
.
I have tried using if statement, but the line of code increases. I believe that there should be an alternate pythonic way which make it easier. (In-built function or using numpy) Any help would be appreciated.