0

So, I'm trying to do something similar to this:

select a, b, c, sum(d), sum(e), count(*)
from df 
group by 1,2,3

In other words, I have this:

a        b        c    d    e
Billy    Profesor 1    10   5
Billy    Profesor 1    17   3
Andrew   Student  8    2    7

And I want the output to be:

a        b        c    d    e    count
Billy    Profesor 1    27   8    2
Andrew   Student  8    2    7    1

I tried this, and it partially worked:

df.groupby(['a','b','c']).sum().reset_index()

I still couldn't make it work for the count. I also tried the answer in the post Group dataframe and get sum AND count?, but using the agg function make things very messy and it counts every column.

UPDATE: I changed column c because I have a numeric column to group, but not sum.

dekio
  • 810
  • 3
  • 16
  • 33

2 Answers2

1

You can do a join:

groups=df.groupby(['a','b','c'])
groups.sum().join(groups.size().to_frame('count')).reset_index()

Output:

        a         b   c   d  e  count
0  Andrew   Student  CA   2  7      1
1   Billy  Profesor  NY  27  8      2
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • 1
    don't know what happened, but for me, a few 'Andrew's disappeared on the output – dekio Apr 30 '20 at 16:55
  • 1
    yeah, I tried to fillna with some arbitrary string, but it didn't fixed. I'm losing a few Andrews and Billies by using this code. Not sure why. – dekio Apr 30 '20 at 17:27
0

Try NamedAgg

df_final = df.groupby(['a','b','c'], sort=False).agg(d=('d', 'sum'), 
                                                     e=('e', 'sum'), 
                                                     count=('e', 'count')).reset_index()

Out[12]:
        a         b   c   d  e  count
0   Billy  Profesor  NY  27  8      2
1  Andrew   Student  CA   2  7      1
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • is there any other way? The problem with this is that I actually don't have only d and e to sum, it's at least 100 columns to summarise – dekio Apr 30 '20 at 17:03
  • 1
    you may construct a dictionary for `NamedAgg` for columns you want to sum using dict comprehension – Andy L. Apr 30 '20 at 17:13