2

I have a Pandas dataframe that looks like this:

+---+--------+-------------+------------------+
|   | ItemID | Description | Feedback         |
+---+--------+-------------+------------------+
| 0 | 8988   | Tall Chair  | I hated it       |
+---+--------+-------------+------------------+
| 1 | 8988   | Tall Chair  | Best chair ever  |
+---+--------+-------------+------------------+
| 2 | 6547   | Big Pillow  | Soft and amazing |
+---+--------+-------------+------------------+
| 3 | 6547   | Big Pillow  | Horrific color   |
+---+--------+-------------+------------------+

And I want to concatenate the values from the "Feedback" column into a new column, separated by commas, where the ItemID matches. Like so:

+---+--------+-------------+----------------------------------+
|   | ItemID | Description | NewColumn                        |
+---+--------+-------------+----------------------------------+
| 0 | 8988   | Tall Chair  | I hated it, Best chair ever      |
+---+--------+-------------+----------------------------------+
| 1 | 6547   | Big Pillow  | Soft and amazing, Horrific color |
+---+--------+-------------+----------------------------------+

I've tried several variations of pivot, merge, stacking, etc. and am stuck.
I think the NewColumn would end up being an array but I'm fairly new to Python so I'm not certain.
Also, ultimately, I'm going to try and use this for text classification (for a new "Description" generate some "Feedback" labels [multiclass problem])

JDurstberger
  • 4,127
  • 8
  • 31
  • 68
nacc
  • 311
  • 3
  • 8

2 Answers2

1

Call .groupby('ItemID') on your dataframe, and then concatenate the feedback column:

df.groupby('ItemID')['Feedback'].apply(lambda x: ', '.join(x))

See Pandas groupby: How to get a union of strings.

dbc
  • 677
  • 8
  • 21
1

I think you can groupby by columns ItemID and Description, apply join and last reset_index:

print df.groupby(['ItemID', 'Description'])['Feedback'].apply(', '.join).reset_index(name='NewColumn')
   ItemID Description                         NewColumn
0    6547  Big Pillow  Soft and amazing, Horrific color
1    8988  Tall Chair       I hated it, Best chair ever

If you dont need Description column:

print df.groupby(['ItemID'])['Feedback'].apply(', '.join).reset_index(name='NewColumn')
   ItemID                         NewColumn
0    6547  Soft and amazing, Horrific color
1    8988       I hated it, Best chair ever
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252