0

I have a DataFrame:

id_1  |   value

 1          6
 1          5
 1          2
 2          1
 3          2
 3          4
...        ...
 

I wish to remove the duplicates of id_1 so that there would be distinct id's. I want that the row that contains the higher value will remain.

output:

id_1  |   value

 1          6
 2          1
 3          4
...        ...

It's quite straightforward to solve when iterating over the rows. But is there a way to do it without using a for loop?

Kevin
  • 1,103
  • 10
  • 33
  • 2
    Have you looked at `groupby` ? `df.groupby('id_1').max()` – Psidom Oct 06 '21 at 21:08
  • `sort_values` + `drop_duplicates` is also an option `df.sort_values('value', ascending=False).drop_duplicates('id_1')` like [this answer](https://stackoverflow.com/a/40629420/15497888). The `groupby max` option is shown in [the accepted answer](https://stackoverflow.com/a/15705958/15497888) `df.groupby('id_1', sort=False, as_index=False)['value'].max()` – Henry Ecker Oct 06 '21 at 21:11

0 Answers0