4

How do I turn the below input data (Pandas dataframe fed from Excel file):

ID      Category                    Speaker     Price
334014  Real Estate Perspectives    Tom Smith   100
334014  E&E                         Tom Smith   200
334014  Real Estate Perspectives    Janet Brown 100
334014  E&E                         Janet Brown 200

into this:

ID      Category                    Speaker                 Price
334014  Real Estate Perspectives    Tom Smith, Janet Brown  100
334014  E&E                         Tom Smith, Janet Brown  200   

So basiscally I want to group by Category, concatenate the Speakers, but not aggregate Price.

I tried different approaches with Pandas dataframe.groupby() and .agg(), but to no avail. Maybe there is simpler pure Python solution?

barciewicz
  • 3,511
  • 6
  • 32
  • 72
  • @harvpan - So you are downvoter, hmmmmm.... – jezrael Aug 01 '18 at 13:14
  • @jezrael, sorry but I guess you should know if the question is dupe or not. I have been downvoted many times for answering dupe. Nothing personal. – harvpan Aug 01 '18 at 13:15
  • 1
    Might be a dupe, not a dupe of that question. Doesn't cover how to get his desired price column. Would probably have to be a combination dupe – user3483203 Aug 01 '18 at 13:19
  • Hmm, not sure what is the protocol in this situation. Anyway, I will upvote jezrael's answer. thanks @user3483203 – harvpan Aug 01 '18 at 13:22
  • @harvpan if you can find a question that aggregates by first I'll close with both as duplicate. – user3483203 Aug 01 '18 at 13:23

2 Answers2

8

There are 2 possible solutions - aggregate by multiple columns and join:

dataframe.groupby(['ID','Category','Price'])['Speaker'].apply(','.join)

Or need aggregate only Price column, then is necessary aggregate all columns by first or last:

dataframe.groupby('Price').agg({'Speaker':','.join, 'ID':'first', 'Price':'first'})
harvpan
  • 8,571
  • 2
  • 18
  • 36
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Try this

df.groupby(['ID','Category'],as_index=False).agg(lambda x : x if x.dtype=='int64' else ', '.join(x))
Pankaj Sharma
  • 388
  • 7
  • 18