0

I have a dataframe as follows : data_frame:

column_A column_B, column_C
  2        3        5            row= 0
  2        3        6            row= 1
  3        4        2            row= 2
  3        4        9            row= 3

what am trying to do is to check the rows that have same values for their column_A and their column_B, and get the maximum value for each group obtained: To be more precise: in my exemple I would like to get the row 1 and row 3. So the expected output should be:

column_A column_B, column_C
  2        3        6            row= 1
  3        4        9            row= 3

I tried to apply groupby with max method to column_C: it seems to work but What i got as output is column_C. Here is my code:

test_df=pd.DataFrame([[2,3,5],[2,3,6],[3,4,2],[3,4,9]],columns=['column_A','column_B','column_C'])
result= test_df.groupby(['column_A','column_B'], sort=False)['column_C'].max()
print(result)

and here is the result

column_A  column_B
2         3           6
3         4           9
Name: column_C, dtype: int64

I know why I have a serie ( since I apply max method on the column_C ), but i couldn't figure out a way to get the corresponding rows instead of only getting column_C values.

mouni93
  • 159
  • 1
  • 2
  • 13

1 Answers1

2

Using sort_values +drop_duplicates

df.sort_values('column_C').drop_duplicates(['column_A','column_B'],keep='last')
Out[186]: 
   column_A  column_B  column_C
1         2         3         6
3         3         4         9
BENY
  • 317,841
  • 20
  • 164
  • 234