Create two columns using pd.groupby

Question

My dataset looks like this:

df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 2, 3, 3],
                   "B": ["a", "b", "c", "c", "b", "b", "d", "a", "c"],
                   "C": ["x", "x", "y", "x", "x", "y", "z", "y", "z"]})

>>> df
   A  B  C
0  1  a  x
1  1  b  x
2  1  c  y
3  1  c  x
4  2  b  x
5  2  b  y
6  2  d  z
7  3  a  y
8  3  c  z

I want to perform a groupby using the values of the A column. Specifically, this is the desired output:

   A        B             C
0  1  a b c c  [x, x, y, x]
1  2    b b d     [x, y, z]
2  3      a c        [y, z]

In other words, I want to join all the values of the B column using a single space, and I want to create a list with all the values of the C column.

So far I have been able to create the two desired columns in this way:

B = df.groupby("A")["B"].apply(lambda x: " ".join(x))
C = df.groupby("A")["C"].apply(list)

I am trying to modify both columns of my dataframe in place with a single groupby operation. Is it possible?

jezrael · Accepted Answer · 2020-04-30T13:20:23.973

3

Use GroupBy.agg, for prevent A convert to index is used as_index=False parameter, also lambda function is simplify:

df1 = df.groupby("A", as_index=False).agg({'B': " ".join, 'C':list})
print (df1)
   A        B             C
0  1  a b c c  [x, x, y, x]
1  2    b b d     [x, y, z]
2  3      a c        [y, z]

edited Apr 30 '20 at 13:20

answered Apr 30 '20 at 13:18

jezrael

822,522
95
1,334
1,252

1

Wow! That's fast! Thank you! – Riccardo Bucco Apr 30 '20 at 13:19

score 2 · Answer 2 · answered Apr 30 '20 at 13:19

2

Yes, you can use groupby().agg:

 df.groupby('A').agg({'B': " ".join, 'C':list})

answered Apr 30 '20 at 13:19

Quang Hoang

146,074
10
56
74

Thanks a lot! It was easier than what I thought – Riccardo Bucco Apr 30 '20 at 13:20

Create two columns using pd.groupby

2 Answers2