2

Lets say we have the following Data:

...    col1    col2    col3
 0      A      1       info
 1      A      2       other
 2      B      3       blabla

I want to use python pandas to find duplicate entries (in column 1) and add them up based on column 2.

In python I would do something like the following:

l = [('A',1), ('A',2), ('B',3)]
d = {}
for i in l:
    if(i[0] not in d.keys()):
        d[i[0]]=i[1]
    else:
        d[i[0]]=d[i[0]]+i[1]
print(d)

So the outcome would be:

{'A': 3, 'B': 3}

Is there an easy way to do the same thing using pandas?

Noah
  • 21,451
  • 8
  • 63
  • 71
fgypas
  • 326
  • 2
  • 11

1 Answers1

6

Use DataFrame.groupby().sum():

In [1]: import pandas

In [2]: df = pandas.DataFrame({"col1":["A", "A", "B"], "col2":[1,2,3]})

In [3]: df.groupby("col1").sum()
Out[3]: 
      col2
col1      
A        3
B        3

In [4]: df.groupby("col1").sum().reset_index()
Out[4]: 
  col1  col2
0    A     3
1    B     3

[2 rows x 2 columns]
Noah
  • 21,451
  • 8
  • 63
  • 71
  • This is working properly, but what if I want to keep column 1? In this case the attributes from column 1 are used like index. – fgypas Apr 29 '14 at 08:43