1

I'm having the following dataframe.

Fruit Description
Apple ["red", "big"]
Banana ["yellow", "long"]
Banana ["elongated, twisted"]
Peach ["round"]
Apple ["round", "greenish"]

And I'm trying to group by the descriptions according to the fruit, through a concatenation of the lists. I should obtain that:

Fruit Description
Apple ["red", "big", "round", "greenish"]
Banana ["yellow", "long", "elongated, twisted"]
Peach ["round"]

I followed the solution provided here: pandas groupby and join lists:

df = df.groupby('Fruit', as_index=False).agg(Description =('Description', 'sum'))

but what I'm obtaining are lists attached to each other:

Fruit Description
Apple ["red", "big"]["round", "greenish"]
Banana ["yellow", "long"]["elongated, twisted"]
Peach ["round"]

Does anyone have a solution? Thanks!

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
Jauhnax
  • 95
  • 1
  • 10

2 Answers2

2

That's because your Description column is string. You can strip out the [] and sum:

 '[' + df['Description'].str[1:-1].groupby(df['Fruit']).agg(', '.join) + ']'
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

In order to keep your list format, I would suggest running a command before yours:

import json
df['Description'] = df['Description'].apply(json.loads)
df = df.groupby('Fruit', as_index=False).agg(Description =('Description', 'sum'))

That way, your values in the Description columns would be actual lists, and not strings.

RobBlanchard
  • 855
  • 3
  • 17