126

I have a dataframe that looks like this:

              Company Name              Organisation Name  Amount
10118  Vifor Pharma UK Ltd  Welsh Assoc for Gastro & Endo 2700.00
10119  Vifor Pharma UK Ltd    Welsh IBD Specialist Group,  169.00
10120  Vifor Pharma UK Ltd             West Midlands AHSN 1200.00
10121  Vifor Pharma UK Ltd           Whittington Hospital   63.00
10122  Vifor Pharma UK Ltd                 Ysbyty Gwynedd   75.93

How do I sum the Amount and count the Organisation Name, to get a new dataframe that looks like this?

              Company Name             Organisation Count   Amount
10118  Vifor Pharma UK Ltd                              5 11000.00

I know how to sum or count:

df.groupby('Company Name').sum()
df.groupby('Company Name').count()

But not how to do both!

cs95
  • 379,657
  • 97
  • 704
  • 746
Richard
  • 62,943
  • 126
  • 334
  • 542

4 Answers4

217

try this:

In [110]: (df.groupby('Company Name')
   .....:    .agg({'Organisation Name':'count', 'Amount': 'sum'})
   .....:    .reset_index()
   .....:    .rename(columns={'Organisation Name':'Organisation Count'})
   .....: )
Out[110]:
          Company Name   Amount  Organisation Count
0  Vifor Pharma UK Ltd  4207.93                   5

or if you don't want to reset index:

df.groupby('Company Name')['Amount'].agg(['sum','count'])

or

df.groupby('Company Name').agg({'Amount': ['sum','count']})

Demo:

In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
                         sum  count
Company Name
Vifor Pharma UK Ltd  4207.93      5

In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
                      Amount
                         sum count
Company Name
Vifor Pharma UK Ltd  4207.93     5
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 3
    @MaxU is there a way to apply sum and count to different but multiple coulmns. When I try to give the columns as list like this: agg({['hotel_name','hotel_country']:'count', ['cost','revenue','clicks']: 'sum'}) it gives "TypeError: unhashable type: 'list'" error – CanCeylan Feb 08 '17 at 09:28
  • @CanCeylan dont know if its possible to do it in a groupby clause but you can achieve it by adding a dummy count-column to the dataframe beforehand then do a groupby sum: `df['count'] = 1` – Karl Anka Mar 09 '17 at 13:23
  • 1
    Finally, 2 hours of searching for how to do this... only the 3rd option: df.groupby('Company Name').agg({'Amount': ['sum','count']}) worked for me. – charo Jun 15 '19 at 22:53
  • Hi thank you for that great solution. In my particular case I am using your solution on two different columns to get the sum and count the number of rows. Unfortunenatly I get the number of rows twice (ofc. because it counts for both columns). Is there a way to remove one of the .counts so my table looks clean? ```df.groupby(df['L2 Name'])[["Amount arrear","VSU"]].agg(['sum','count'])``` – MLAlex Sep 01 '20 at 08:08
  • Hello thank you for your great answer. Do you know how to interpret the new columns that are created and how to flatten it to a more traditional way? – Solal Dec 21 '20 at 18:06
  • Hi, could you take a look at this question https://stackoverflow.com/questions/70954791/identifying-statistical-outliers-with-pandas-groupby-and-reduce-rows-into-diffe – Aaditya Ura Feb 02 '22 at 11:33
52

Just in case you were wondering how to rename columns during aggregation, here's how for

pandas >= 0.25: Named Aggregation

df.groupby('Company Name')['Amount'].agg(MySum='sum', MyCount='count')

Or,

df.groupby('Company Name').agg(MySum=('Amount', 'sum'), MyCount=('Amount', 'count'))

                       MySum  MyCount
Company Name                       
Vifor Pharma UK Ltd  4207.93        5
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    This should be the excepted answer, is there a way to update old questions/answers with the new better way of doing things? The excepted answer isn't wrong, just no longer the best way. – JSharm Jun 17 '20 at 10:39
  • @JSharm obviously you cannot change the OP's mind but you can certainly upvote the posts you feel deserve to be at the top. If enough people think and act the same way you do, we will get there some day ;) PS not to throw shade at the accepted answer, I still think it's the best answer for this question as long as pandas continues to support the syntax, which I'm reasonably confident will be for a good while yet. – cs95 Jun 17 '20 at 11:04
5

If you have lots of columns and only one is different you could do:

In[1]: grouper = df.groupby('Company Name')
In[2]: res = grouper.count()
In[3]: res['Amount'] = grouper.Amount.sum()
In[4]: res
Out[4]:
                      Organisation Name   Amount
Company Name                                   
Vifor Pharma UK Ltd                  5  4207.93

Note you can then rename the Organisation Name column as you wish.

JSharm
  • 1,117
  • 12
  • 11
3
df.groupby('Company Name').agg({'Organisation name':'count','Amount':'sum'})\
    .apply(lambda x: x.sort_values(['count','sum'], ascending=False))
rje
  • 6,388
  • 1
  • 21
  • 40
cvsnow
  • 31
  • 1