Python,Pandas,DataFrame, add new column doing SQL GROUP_CONCAT equivalent

Question

My question is very similar to the one asked but unanswered here Replicating GROUP_CONCAT for pandas.DataFrame

I have a Pandas DataFame which I want to group concat into a data frame

+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
+------+---------+

Becoming

+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn              |  
| B    | dawg                                  | 
+------+---------------------------------------+

As answeed in the original topic, it can be done via any of these:

df.groupby('team').apply(lambda x: ','.join(x.user))
df.groupby('team').apply(lambda x: list(x.user))
df.groupby('team').agg({'user' : lambda x: ', '.join(x)})

But the resulting object is not a Pandas Dataframe anymore. How can I get the GROUP_CONCAT results in the original Pandas DataFrame as a new column?

Cheers

sophocles · Answer 1 · 2021-01-09T11:31:27.523

Let's break down the below code:

Firstly, groupby team and, use apply on the user to join it's elements using a ,
Then, reset the index, and rename the resulting dataframe (axis=1, refers to columns and not rows)

res = (df.groupby('team')['user']
       .apply(lambda x: ','.join(str(i) for i in x))).reset_index().rename({'user':'group_concat(user)'},axis=1)

Output:

  team        group_concat(user)
0    A  elmer,daffy,bugs,foghorn
1    B                      dawg

score 0 · Answer 2 · answered Jan 09 '21 at 11:12

You can apply list and join after grouping by, then reset_index to get the dataframe.

output_df = df.groupby('team')['user'].apply(lambda x: ",".join(list(x))).reset_index()
output_df.rename(columns={'user': 'group_concat(user)'})

    team    group_concat(user)
0   A   elmer,daffy,bugs,foghorn
1   B   dawg

Python,Pandas,DataFrame, add new column doing SQL GROUP_CONCAT equivalent

2 Answers2