Pandas Dataframe - find sums in column B across each label in column A

Question

Lets say we have the following Data:

...    col1    col2    col3
 0      A      1       info
 1      A      2       other
 2      B      3       blabla

I want to use python pandas to find duplicate entries (in column 1) and add them up based on column 2.

In python I would do something like the following:

l = [('A',1), ('A',2), ('B',3)]
d = {}
for i in l:
    if(i[0] not in d.keys()):
        d[i[0]]=i[1]
    else:
        d[i[0]]=d[i[0]]+i[1]
print(d)

So the outcome would be:

{'A': 3, 'B': 3}

Is there an easy way to do the same thing using pandas?

Noah · Accepted Answer · 2014-04-29T15:54:33.327

6

Use DataFrame.groupby().sum():

In [1]: import pandas

In [2]: df = pandas.DataFrame({"col1":["A", "A", "B"], "col2":[1,2,3]})

In [3]: df.groupby("col1").sum()
Out[3]: 
      col2
col1      
A        3
B        3

In [4]: df.groupby("col1").sum().reset_index()
Out[4]: 
  col1  col2
0    A     3
1    B     3

[2 rows x 2 columns]

edited Apr 29 '14 at 15:54

answered Apr 28 '14 at 16:35

Noah

21,451
8
63
71

This is working properly, but what if I want to keep column 1? In this case the attributes from column 1 are used like index. – fgypas Apr 29 '14 at 08:43

Pandas Dataframe - find sums in column B across each label in column A

1 Answers1