0

Trying to merge duplicate rows with identical columns besides value but want to keep all data that is not duplicated.

I thought a groupby function and resetting the index would allow me to achieve this goal but that obviously did not work.

Tried to run Microsoft Visual Basic for Applications to achieve my goal but it omitted non duplicate data as well.

Was hoping for some pandas or even excel tips or pandas/excel documentation that could assist me.

My Code:

grouped_df = result1.groupby(['ID','Name','Value'])

maximums = grouped_df.max('Price')


maximums = maximums.reset_index()

Dataset before:

ID Name Value
1 Apple 3
2 Banana 4
2 Banana 5
3 Orange 3
4 Pear 7
4 Pear 5

What I am getting with my code:

ID Name Value
2 Banana 5
4 Pear 7

What I wish to achieve:

ID Name Value
1 Apple 3
2 Banana 5
3 Orange 3
4 Pear 7
  • Something like `df = df.sort_values(['ID', 'Value']).drop_duplicates(['ID', 'Name'], keep='last', ignore_index=True)` (from the linked duplicate) – Henry Ecker Aug 20 '21 at 21:29
  • 1
    Or with [`SeriesGroupBy.max`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.max.html) instead of passing 'Price' as an argument to `max` select from grouper like -> `df = df.groupby(['ID', 'Name'], as_index=False)['Value'].max()` (from the second linked duplicate) – Henry Ecker Aug 20 '21 at 21:37
  • Thank you, I found being more specific with my groupby function and using your sort_values function actually returned what I need twice. – i.d.s.chicago Aug 20 '21 at 21:51
  • You shouldn't include the column you're trying to find the max of in the groupby. As in the second comment groupby ID and Name and select Value. – Henry Ecker Aug 20 '21 at 21:52

0 Answers0