0

I have many dataframes of equal lenght and equal Datetime indexes

    Date    OPP
0   2008-01-04  0.0
1   2008-02-04  0.0
2   2008-03-04  0.0
3   2008-04-04  0.0
4   2008-05-04  0.0
5   2008-06-04  0.0
6   2008-07-04  393.75
7   2008-08-04  -168.75
8   2008-09-04  -656.25
9   2008-10-04  -1631.25


    Date    OPP
0   2008-01-04  750.0
1   2008-02-04  0.0
2   2008-03-04  150.0
3   2008-04-04  600.0
4   2008-05-04  0.0
5   2008-06-04  0.0
6   2008-07-04  0.0
7   2008-08-04  -250.0
8   2008-09-04  1000.0
9   2008-10-04  0.0

I need to create a unique dataframe that sums all the OPP columns from many dataframes. This can easily be done like this:

df3 = df1["OPP"] + df2["OPP"]
df3["Date"] = df1["Date"]

This works as long as all the dataframes are same length and same Date index.

How can I make it work even if these conditions aren't met? What if I had another dataframe like this:

        Date      OPP
0 2008-07-04   393.75
1 2008-08-04  -168.75
2 2008-09-04  -656.25
3 2008-10-04 -1631.25
4 2008-11-04  -675.00
5 2008-12-04     0.00

I could do this manually: search for the df with the smallest starting date, the one with the biggest starting date and fill every df with all the dates and zeroes, so that I'd have df's of equal lenght... and then proceed with a simple sum.

But, is there a way to do this automatically in Pandas?

Saturnix
  • 10,130
  • 17
  • 64
  • 120
  • How many dataframes do you have? – Erfan Jul 18 '19 at 13:14
  • Should be scalable: can get up to 200, only 10 now in testing – Saturnix Jul 18 '19 at 13:15
  • They all have correct indexes now in testing, but in production they could get supplied with unalligned indexes and I want it to work anyway. I don't want to assume the provided data to have the same datetime indexes. – Saturnix Jul 18 '19 at 13:17

3 Answers3

2

Following this answers method, we can use functools.reduce for this.

Whats left is to only sum over axis=1:

from functools import reduce

dfs = [df1, df2, df3]

df_final = reduce(lambda left,right: pd.merge(left,right,on='Date', how='left'), dfs)

Which gives us:

         Date    OPP_x   OPP_y      OPP
0  2008-01-04     0.00   750.0      NaN
1  2008-02-04     0.00     0.0      NaN
2  2008-03-04     0.00   150.0      NaN
3  2008-04-04     0.00   600.0      NaN
4  2008-05-04     0.00     0.0      NaN
5  2008-06-04     0.00     0.0      NaN
6  2008-07-04   393.75     0.0   393.75
7  2008-08-04  -168.75  -250.0  -168.75
8  2008-09-04  -656.25  1000.0  -656.25
9  2008-10-04 -1631.25     0.0 -1631.25

Then we sum:

df_final.iloc[:, 1:].sum(axis=1)

0     750.0
1       0.0
2     150.0
3     600.0
4       0.0
5       0.0
6     787.5
7    -587.5
8    -312.5
9   -3262.5
dtype: float64

Or as new column:

df_final['sum'] = df_final.iloc[:, 1:].sum(axis=1)

         Date    OPP_x   OPP_y      OPP     sum
0  2008-01-04     0.00   750.0      NaN   750.0
1  2008-02-04     0.00     0.0      NaN     0.0
2  2008-03-04     0.00   150.0      NaN   150.0
3  2008-04-04     0.00   600.0      NaN   600.0
4  2008-05-04     0.00     0.0      NaN     0.0
5  2008-06-04     0.00     0.0      NaN     0.0
6  2008-07-04   393.75     0.0   393.75   787.5
7  2008-08-04  -168.75  -250.0  -168.75  -587.5
8  2008-09-04  -656.25  1000.0  -656.25  -312.5
9  2008-10-04 -1631.25     0.0 -1631.25 -3262.5
Erfan
  • 40,971
  • 8
  • 66
  • 78
1

Use list comprehension for create Series with DatetimeIndex, then join together by concat and sum:

dfs = [df1, df2]

compr = [x.set_index('Date')['OPP'] for x in dfs]
df1 = pd.concat(compr, axis=1).sum(axis=1).reset_index(name='OPP')
print (df1)
         Date      OPP
0  2008-01-04   750.00
1  2008-02-04     0.00
2  2008-03-04   150.00
3  2008-04-04   600.00
4  2008-05-04     0.00
5  2008-06-04     0.00
6  2008-07-04   393.75
7  2008-08-04  -418.75
8  2008-09-04   343.75
9  2008-10-04 -1631.25
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can simply concat them and sum on groupby date:

(pd.concat((df1,df2,df3))
   .groupby('Date', as_index=False)
   .sum()
)

Output for your three sample dataframes:

          Date     OPP
0   2008-01-04   750.0
1   2008-02-04     0.0
2   2008-03-04   150.0
3   2008-04-04   600.0
4   2008-05-04     0.0
5   2008-06-04     0.0
6   2008-07-04   787.5
7   2008-08-04  -587.5
8   2008-09-04  -312.5
9   2008-10-04 -3262.5
10  2008-11-04  -675.0
11  2008-12-04     0.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74