0

I would like to combine groupby and min but keep the entire dataframe. If I use the below approach I end up with only the 2 columns, the col1 and col2:

For this df:

col1  col2    col3
1      1       'A'
1      0       'B'
2      2       'C'
2      3       'D'

df.groupby(df['col1'])[['col2']].min():

col1  col2    
1      0       
2      2      

But instead once the min row of col2 is identified, I want the corresponding elements of that row from col3, so this:

col1  col2    col3
1      0       'B'
2      2       'C'
Niccola Tartaglia
  • 1,537
  • 2
  • 26
  • 40

1 Answers1

1

The easiest way to do this would be in two steps. First to prepare a support dataframe that contains these minimal values. Second - internal merge of the initial dataframe with the supporting one. You can look at it like "inner join" but without separating the columns (typical join would require you to add suffixes to differentiate between the sources of the data - left and right).

First we create our initial dataframe: df1 = pd.DataFrame(data={'col1':[1,1,2,2],'col2':[1,0,2,3],'col3':['A','B','C','D']})

Then we have to perform our groupby. We have to reset the index after the fact. Otherwise column col1 will be treated as an index: df2 = df.groupby(df['col1'])[['col2']].min().reset_index()

And our last step is to merge both internally: pd.merge(df1, df2, how='inner')

Crad
  • 51
  • 2