groupby, sum and count to one table

Question

I have a dataframe below

df=pd.DataFrame({"A":np.random.randint(1,10,9),"B":np.random.randint(1,10,9),"C":list('abbcacded')})

   A  B  C
0  9  6  a
1  2  2  b
2  1  9  b
3  8  2  c
4  7  6  a
5  3  5  c
6  1  3  d
7  9  9  e
8  3  4  d

I would like to get grouping result (with key="C" column) below,and the row c d and e is dropped intentionally.

   number   A_sum   B_sum
a   2        16       15
b   2        3        11

this is 2row*3column dataframe. the grouping key is column C. And The column "number"represents the count of each letter(a and b). A_sum and B_sum represents grouping sum of letters in column C.

I guess we should use method groupby but how can I get this data summary table ?

Where are the rows for `c, d, e`? are they omitted by you intentionally? — Psidom, Sep 02 '16 at 15:27
sorry. I add imformation. yes it was omitted intentionally! I will edit. — Heisenberg, Sep 02 '16 at 15:33

score 7 · Answer 1 · answered Sep 02 '16 at 15:49

7

You can do this using a single groupby with

res = df.groupby(df.C).agg({'A': 'sum', 'B': {'sum': 'sum', 'count': 'count'}})

res.columns = ['A_sum', 'B_sum', 'count']

answered Sep 02 '16 at 15:49

Ami Tavory

74,578
11
141
185

Thanks for sharing this. While trying to see if this can be done without listing the columns twice, I found out that this dict-of-dicts syntax was [deprecated](https://stackoverflow.com/q/44635626/1026) (otherwise you could name the resulting columns without a separate statement using `'B': {'A_sum': 'sum', ...}`) and came up with a helper function that makes this easy: https://stackoverflow.com/a/50558529/1026 – Nickolay May 28 '18 at 08:10

score 2 · Accepted Answer · answered Sep 02 '16 at 15:44

One option is to count the size and sum the columns for each group separately and then join them by index:

df.groupby("C")['A'].agg({"number": 'size'}).join(df.groupby('C').sum())

    number  A   B
# C         
# a     2   11  8
# b     2   14  12
# c     2   8   5
# d     2   11  12
# e     1   7   2

You can also do df.groupby('C').agg(["sum", "size"]) which gives an extra duplicated size column, but if you are fine with that, it should also work.

groupby, sum and count to one table

2 Answers2

Linked