Pandas - group by one column, sort by another, get value from the third column

Question

I would like to take pandas dataframe, group it by one column, sort it by another column and take first element from third column and populate original dataframe.

Here is my original df. I would to group by col_1, sort by col_2 (ascending) and take first element from col_3 and populate col_4 with results.

df_in = pd.DataFrame({'col_1':['A', 'A', 'A', 'B', 'B', 'B'], 'col_2': [5,9,2, 3,7,1],
                   'col_3': ['c','d','k','n','l','f']})

[ original_df[1]

Here is how output df should look like:

df_out = pd.DataFrame({'col_1':['A', 'A', 'A', 'B', 'B', 'B'], 'col_2': [5,9,2, 3,7,1],
                   'col_3': ['c','d','k','n','l','f'], 'col_4': ['k','k','k','f','f','f'], })

I can accomplish grouping and sorting with group and transform, but how to extract first element is not clear.

Sorry SO does not display images correctly ;-(

score 3 · Accepted Answer · answered May 31 '19 at 17:16

3

 df['col_4']=df.sort_values(['col_1','col_2']).groupby('col_1')['col_3'].transform(lambda x: x.iloc[0])

Output:

  col_1  col_2 col_3 col_4
0     A      5     c     k
1     A      9     d     k
2     A      2     k     k
3     B      3     n     f
4     B      7     l     f
5     B      1     f     f

answered May 31 '19 at 17:16

iamchoosinganame

1,090
6
15

1

Is it necessary `sort_values(['col_1','col_2'])`, I only need to sort by col_2. Maybe `sort_values(['col_2'])` is enought? – user1700890 May 31 '19 at 17:21

score 2 · Answer 2 · answered May 31 '19 at 17:04

Try, given your index is as shown,

df_in.sort_values('col_2',  inplace=True)
df_in['col_4'] = df_in.groupby('col_1').col_3.transform('first')

df_in.sort_index(inplace=True)

Output:

+----+---------+---------+---------+---------+
|    | col_1   |   col_2 | col_3   | col_4   |
|----+---------+---------+---------+---------|
|  0 | A       |       5 | c       | k       |
|  1 | A       |       9 | d       | k       |
|  2 | A       |       2 | k       | k       |
|  3 | B       |       3 | n       | f       |
|  4 | B       |       7 | l       | f       |
|  5 | B       |       1 | f       | f       |
+----+---------+---------+---------+---------+

@anky_91 similar to my answer, except that you don't need to sort on `col_1`. — Quang Hoang, May 31 '19 at 17:14

WebDev · Answer 3 · 2019-05-31T17:15:25.480

1

You can use

first_values = df_in.sort_values(['col_1','col_2']).groupby('col_1')['col_3'].first().rename('col_4')
df_in = df_in.join(first_values, on='col_1')

Output:

  col_1  col_2 col_3 col_4
0     A      5     c     k
1     A      9     d     k
2     A      2     k     k
3     B      3     n     f
4     B      7     l     f
5     B      1     f     f

edited May 31 '19 at 17:15

answered May 31 '19 at 17:08

WebDev

1,211
11
17

score 1 · Answer 4 · answered May 31 '19 at 17:09

Try some thing diff idxmin

s=df_in.groupby(['col_1']).col_2.transform('idxmin')
df_in['New']=df_in.col_3.reindex(s).values
df_in
Out[469]: 
  col_1  col_2 col_3 New
0     A      5     c   k
1     A      9     d   k
2     A      2     k   k
3     B      3     n   f
4     B      7     l   f
5     B      1     f   f

Pandas - group by one column, sort by another, get value from the third column

4 Answers4

Linked