python pandas simple pivot table sum count

Question

I'm trying to identify the best way to make a simple pivot on my data:

import pandas    
dfn = pandas.DataFrame({
    "A" : [ 'aaa', 'bbb', 'aaa', 'bbb' ],
    "B" : [     1,    10,     2,   30  ],
    "C" : [     2,     0,     3,   20  ] })

The output I would like to have is a dataframe, grouped by A, that sum and count values of B and C, and names have to be exactly (Sum_B, Sum_C, Count), as following:

A   Sum_B  Sum_C  Count
aaa    3      5       2
bbb   50     20       2

What is the fastest way to do this?

also I'm guessing `bbb` `Sum_B` should be 40 not 50 right? – evan54 Jun 22 '16 at 10:43 — evan54, Jun 22 '16 at 10:43

score 5 · Answer 1 · edited May 23 '17 at 12:14

5

you can use .agg() function:

In [227]: dfn.groupby('A').agg({'B':sum, 'C':sum, 'A':'count'}).rename(columns={'A':'count'})
Out[227]:
      B  count   C
A
aaa   3      2   5
bbb  40      2  20

or with reset_index():

In [239]: dfn.groupby('A').agg({'B':sum, 'C':sum, 'A':'count'}).rename(columns={'A':'count'}).reset_index()
Out[239]:
     A   B  count   C
0  aaa   3      2   5
1  bbb  40      2  20

PS Here is a link to examples provided by @evan54

edited May 23 '17 at 12:14

Community

1
1

answered Jun 22 '16 at 10:39

MaxU - stand with Ukraine

205,989
36
386
419

1

didn't know about `agg` (aggregate), this is a nice description http://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/ – evan54 Jun 22 '16 at 10:42
I add also .reset_index() in order to "move" 'A' to a column, since i have to write a csv – DPColombotto Jun 22 '16 at 10:48

score 3 · Answer 2 · edited May 23 '17 at 12:08

I think you can use aggregate with double sum and len, rename columns and last change order of columns by reindex_axis:

print (dfn.groupby('A')
          .agg({'B': sum, 'C':sum, 'A':len})
          .rename(columns={'A': 'Count', 'B': 'Sum_B', 'C': 'Sum_C'})
          .reindex_axis(['Sum_B','Sum_C','Count'], axis=1))  

     Sum_B  Sum_C  Count
A                       
aaa      3      5      2
bbb     40     20      2

If need reset_index:

print (dfn.groupby('A')
          .agg({'B': sum, 'C':sum, 'A':len})
          .rename(columns={'A': 'Count', 'B': 'Sum_B', 'C': 'Sum_C'})
          .reindex_axis(['Sum_B','Sum_C','Count'], axis=1)
          .reset_index())  

     A  Sum_B  Sum_C  Count
0  aaa      3      5      2
1  bbb     40     20      2

Instead len you can use size, what is same:

print (dfn.groupby('A')
          .agg({'B': sum, 'C':sum, 'A':'size'})
          .rename(columns={'A': 'Count', 'B': 'Sum_B', 'C': 'Sum_C'})
          .reindex_axis(['Sum_B','Sum_C','Count'], axis=1)
          .reset_index())  

     A  Sum_B  Sum_C  Count
0  aaa      3      5      2
1  bbb     40     20      2

Differences between size and count (see differences with numeric values):

score 1 · Answer 3 · answered Jun 22 '16 at 10:37

One way to do this is to group by the column you want and then sum as shown below.

r = dfn.groupby('A').sum()
r['counts'] = dfn.A.value_counts() # this could also be written as dfn['A'].value_counts()

The first line computes the sums and the second line gets the number of times each of the values appears in A.

python pandas simple pivot table sum count

3 Answers3