51

I'm trying to merge two DataFrames summing columns value.

>>> print(df1)
   id name  weight
0   1    A       0
1   2    B      10
2   3    C      10

>>> print(df2)
   id name  weight
0   2    B      15
1   3    C      10

I need to sum weight values during merging for similar values in the common column.

merge = pd.merge(df1, df2, how='inner')

So the output will be something like following.

   id name  weight
1   2    B      25
2   3    C      20
Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121

3 Answers3

50

This solution works also if you want to sum more than one column. Assume data frames

>>> df1
   id name  weight  height
0   1    A       0       5
1   2    B      10      10
2   3    C      10      15
>>> df2
   id name  weight  height
0   2    B      25      20
1   3    C      20      30

You can concatenate them and group by index columns.

>>> pd.concat([df1, df2]).groupby(['id', 'name']).sum().reset_index()
   id name  weight  height
0   1    A       0       5
1   2    B      35      30
2   3    C      30      45
Jan Kislinger
  • 1,441
  • 14
  • 26
26
In [41]: pd.merge(df1, df2, on=['id', 'name']).set_index(['id', 'name']).sum(axis=1)
Out[41]: 
id  name
2   B       25
3   C       20
dtype: int64
waitingkuo
  • 89,478
  • 28
  • 112
  • 118
25

If you set the common columns as the index, you can just sum the two dataframes, much simpler than merging:

In [30]: df1 = df1.set_index(['id', 'name'])

In [31]: df2 = df2.set_index(['id', 'name'])

In [32]: df1 + df2
Out[32]: 
         weight
id name        
1  A        NaN
2  B         25
3  C         20
joris
  • 133,120
  • 36
  • 247
  • 202
  • 13
    to avoid `NaN`, you could use `fill_value` for missing names in one of the dataframes: `df1.add(df2, fill_value=0)` – jfs May 14 '18 at 10:14
  • If you don't know the common columns, the approach really nice: `(df1 + df2).fillna(df1).fillna(df2)`. Can we do that efficiently ? – Alex Nea Kameni Mar 21 '23 at 14:49