Adding two pandas dataframes

Question

I have two dataframes, both indexed by timeseries. I need to add the elements together to form a new dataframe, but only if the index and column are the same. If the item does not exist in one of the dataframes then it should be treated as a zero.

I've tried using .add but this sums regardless of index and column. Also tried a simple combined_data = dataframe1 + dataframe2 but this give a NaN if both dataframes don't have the element.

Any suggestions?

Can you clarify what you want to happen if an item does not exist in one or both dataframes? You say if the item does not exist in *one* dataframe, it should be treated as zero --- do you mean the value in that dataframe should be treated as zero and added to the value from the other dataframe, or do you mean the value in the result dataframe should be zero? Also, you say `df1+df2` doesn't work because it gives NaN if both don't have the element. What do you want to happen in this case? You want a zero in the result? — BrenBarn, Jun 19 '12 at 18:44

score 119 · Accepted Answer · edited Mar 11 '14 at 08:55

119

How about x.add(y, fill_value=0)?

import pandas as pd

df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
Out: 
   a  b
0  1  2
1  3  4
2  5  6

df2 = pd.DataFrame([(100,200),(300,400),(500,600)], columns=['a','b'])
Out: 
     a    b
0  100  200
1  300  400
2  500  600

df_add = df1.add(df2, fill_value=0)
Out: 
     a    b
0  101  202
1  303  404
2  505  606

edited Mar 11 '14 at 08:55

Renaud

16,073
6
81
79

answered Jun 20 '12 at 03:28

Wes McKinney

101,437
32
142
108

Perfect, just what I was after. Thanks – cs0679 Jun 20 '12 at 10:53
3

I recently found that this method doesn't work if you first create two dataframes and then set their index columns as one of the existing dataframe columns using df.set_index('Column_A') for example – user8188120 Nov 30 '18 at 12:25
@user8188120 That is probably because the column number and / or the column names differ between the two. The column names must be the same in both dfs, else, the df columns get just concatenated to the existing df. – questionto42 Jul 09 '21 at 19:39

score 17 · Answer 2 · edited Feb 21 '19 at 21:29

If I understand you correctly, you want something like:

(x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))

This will give the sum of the two dataframes. If a value is in one dataframe and not the other, the result at that position will be that existing value (look at B0 in X and B0 in Y and look at final output). If a value is missing in both dataframes, the result at that position will be zero (look at B1 in X and B1 in Y and look at final output).

>>> x
   A   B   C
0  1   2 NaN
1  3 NaN   4
>>> y
    A   B   C
0   8 NaN  88
1   2 NaN   5
2  10  11  12
>>> (x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))
    A   B   C
0   9   2  88
1   5   0   9
2  10  11  12

Thanks, but I didn't explain my data very well as I have different columns in both DataFrames e.g. A, B, C in dataframe1 and A, B, D in dataframe 2. The output should be a dataframe with A, B, C, D — cs0679, Jun 20 '12 at 10:56

score 4 · Answer 3 · answered Feb 16 '17 at 12:19

4

Both the above answers - fillna(0) and a direct addition would give you Nan values if either of them have different structures.

Its Better to use fill_value

df.add(other_df, fill_value=0)

answered Feb 16 '17 at 12:19

Prafulla Pallal

57
1

Downvote. It is an old answer anyway, but this is exactly the accepted answer that had been given three years before this answer. – questionto42 Jul 09 '21 at 19:35

Xavi · Answer 4 · 2014-11-07T11:13:09.213

For making more general the answer... first I will take the common index for synchronizing both dataframes, then I will join each of them to my pattern (dates) and I will sum the columns of the same name and finally join both dataframes (deleting added columns in one of them),

you can see an example (with google's stock prices taken from google) here:

import numpy as np
import pandas as pd
import datetime as dt

prices = pd.DataFrame([[553.0, 555.5, 549.3, 554.11, 0],
                       [556.8, 556.8, 544.05, 545.92, 545.92],
                       [545.5, 546.89, 540.97, 542.04, 542.04]],
                       index=[dt.datetime(2014,11,04), dt.datetime(2014,11,05), dt.datetime(2014,11,06)],
                       columns=['Open', 'High', 'Low', 'Close', 'Adj Close'])

corrections = pd.DataFrame([[0, 555.22], [1238900, 0]],
                    index=[dt.datetime(2014,11,3), dt.datetime(2014,11,4)],
                    columns=['Volume', 'Adj Close'])

dates = pd.DataFrame(prices.index, columns = ['Dates']).append(pd.DataFrame(corrections.index, columns = ['Dates'])).drop_duplicates('Dates').set_index('Dates').sort(axis=0)
df_corrections = dates.join(corrections).fillna(0)
df_prices = dates.join(prices).fillna(0)

for col in prices.columns:
    if col in corrections.columns:
        df_prices[col]+=df_corrections[col]
        del df_corrections[col]

df_prices = df_prices.join(df_corrections)

Adding two pandas dataframes

4 Answers4

Linked

Related