0

I have a pandas Dataframe df and I want to Group by text column with aggregation of:

  • Stack the english_word and return the list
  • Sum the count column

Now I only can do either making the english_word list or sum the count column. I try to do that, but it return error. How to do both of that aggregation?

In simple, what I want:

text

saya eat chicken

english_word

[eat,chicken]

count

2

df.groupby('text', as_index=False).agg({'count' : lambda x: x.sum(), 'english_word' : lambda x: x.list()})

This is the example of df:

df = pd.DataFrame({'text': ['Saya eat chicken', 'Saya eat chicken'], 
                   'english_word': ['eat', 'chicken'],
                   'count': [1,1]})
Evan
  • 78
  • 1
  • 12
  • 1
    Hello, welcome :)) [Please read this post on how to provide a great pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [refer to this one on how to provide a minimal, complete, and verifiable example](https://stackoverflow.com/help/minimal-reproducible-example) and revise your question accordingly so people in the community can easily help you. You don't need to post the actual dataframe, just a simple representation of what does it contain. – Joe Jan 28 '20 at 13:31
  • Please provide sample input table, and a sample of wanted output – Ukrainian-serge Feb 24 '20 at 10:31

2 Answers2

3

You are almost there, you can do:

s = df.groupby('text').agg({'word': list, 'num': 'count'}).reset_index()

  text       word  num
0  bla  [i, love]    2

Sample Data

df = pd.DataFrame({'text':['bla','bla'],
                  'word':['i','love'],
                  'num':[1,2,]})
YOLO
  • 20,181
  • 5
  • 20
  • 40
0

Something like this?

def summarise(df):
     return Series(dict(Count = df['count'].sum(), 
                        Words = "{%s}" % ', '.join(df['english_word'])))

new_df = df.groupby('text', as_index=False).agg({'count' : lambda x:x.sum(), 'english_word' : lambda x: x.list()})

new_df.groupby('text').apply(summarise)
Mayowa Ayodele
  • 549
  • 2
  • 11