0

My Pandas Dataframe is in this format:

A    5
A    7
A    4
B    2
B    7
C    8

How could I summarize to this:

A    16
B    9
C    8   
Ned Hulton
  • 477
  • 3
  • 12
  • 27
  • Looks very close or duplicate of http://stackoverflow.com/questions/14941366/pandas-sort-by-group-aggregate-and-column. – alecxe Jul 10 '16 at 00:56
  • 1
    Did you read the doc? [link](http://pandas.pydata.org/pandas-docs/stable/groupby.html) –  Jul 10 '16 at 01:29

2 Answers2

6

You can use groupby:

  col1  col2
0   A   5
1   A   7
2   A   4
3   B   2
4   B   7
5   C   8

df.groupby('col1')['col2'].sum()
col1
A    16
B     9
C     8

If you want to keep the columns as they are, as you mentioned in your comment, you can convert the groupby object to a new dataframe, if this is what you meant. So, you can do this instead:

new = pd.DataFrame({'col2' : df.groupby('col1')['col2'].sum()}).reset_index()
new
  col1  col2
0   A   16
1   B   9
2   C   8 
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
  • Thanks for the rapid answer. One small problem: the columns are no longer called "col1" and "col2". Is it possible to add another line of code so that the columns retain their names? – Ned Hulton Jul 10 '16 at 01:19
  • @NedHulton I added a new solution to my answer based on your comment. Is this what you meant? – Joe T. Boka Jul 10 '16 at 01:47
1

I think you could use pivot_table for that with sum as aggregation function:

In [9]: df
Out[9]: 
   0  1
0  A  5
1  A  7
2  A  4
3  B  2
4  B  7
5  C  8

In [10]: df.pivot_table(index=0, aggfunc=sum).reset_index()
Out[10]: 
   0   1
0  A  16
1  B   9
2  C   8
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93