Pandas: Summarize table based on column value

Question

My Pandas Dataframe is in this format:

How could I summarize to this:

A    16
B    9
C    8

Looks very close or duplicate of http://stackoverflow.com/questions/14941366/pandas-sort-by-group-aggregate-and-column. — alecxe, Jul 10 '16 at 00:56
Did you read the doc? [link](http://pandas.pydata.org/pandas-docs/stable/groupby.html) — , Jul 10 '16 at 01:29

Joe T. Boka · Accepted Answer · 2016-07-10T01:46:21.047

6

You can use groupby:

  col1  col2
0   A   5
1   A   7
2   A   4
3   B   2
4   B   7
5   C   8

df.groupby('col1')['col2'].sum()
col1
A    16
B     9
C     8

If you want to keep the columns as they are, as you mentioned in your comment, you can convert the groupby object to a new dataframe, if this is what you meant. So, you can do this instead:

new = pd.DataFrame({'col2' : df.groupby('col1')['col2'].sum()}).reset_index()
new
  col1  col2
0   A   16
1   B   9
2   C   8

edited Jul 10 '16 at 01:46

answered Jul 10 '16 at 00:59

Joe T. Boka

6,554
6
29
48

Thanks for the rapid answer. One small problem: the columns are no longer called "col1" and "col2". Is it possible to add another line of code so that the columns retain their names? – Ned Hulton Jul 10 '16 at 01:19
@NedHulton I added a new solution to my answer based on your comment. Is this what you meant? – Joe T. Boka Jul 10 '16 at 01:47

score 1 · Answer 2 · answered Jul 10 '16 at 11:00

I think you could use pivot_table for that with sum as aggregation function:

In [9]: df
Out[9]: 
   0  1
0  A  5
1  A  7
2  A  4
3  B  2
4  B  7
5  C  8

In [10]: df.pivot_table(index=0, aggfunc=sum).reset_index()
Out[10]: 
   0   1
0  A  16
1  B   9
2  C   8

Pandas: Summarize table based on column value

2 Answers2