-1

How can I sum up 2 Pandas Dataframes on a 'value' column ? They have different sizes, but the time column is unique for both.

First df (df1) :

        name      as_of                time  day  hour  is_peak  month  quarter se_year season type      value  week  wk_year  year
0     LU_STC 2022-09-30 2022-12-31 23:00:00    1     0    False      1        1    None   None   OP  56.197097    52     2022  2023
1     LU_STC 2022-09-30 2023-01-01 00:00:00    1     1    False      1        1    None   None   OP  57.003007    52     2022  2023
2     LU_STC 2022-09-30 2023-01-01 01:00:00    1     2    False      1        1    None   None   OP  57.134680    52     2022  2023
3     LU_STC 2022-09-30 2023-01-01 02:00:00    1     3    False      1        1    None   None   OP  56.957425    52     2022  2023
4     LU_STC 2022-09-30 2023-01-01 03:00:00    1     4    False      1        1    None   None   OP  55.794171    52     2022  2023
...      ...        ...                 ...  ...   ...      ...    ...      ...     ...    ...  ...        ...   ...      ...   ...
8755  LU_STC 2022-09-30 2023-12-31 18:00:00   31    19    False     12        4    None   None   OP  32.253445    52     2023  2023
8756  LU_STC 2022-09-30 2023-12-31 19:00:00   31    20    False     12        4    None   None   OP  35.600396    52     2023  2023
8757  LU_STC 2022-09-30 2023-12-31 20:00:00   31    21    False     12        4    None   None   OP  38.478395    52     2023  2023
8758  LU_STC 2022-09-30 2023-12-31 21:00:00   31    22    False     12        4    None   None   OP  41.564846    52     2023  2023
8759  LU_STC 2022-09-30 2023-12-31 22:00:00   31    23    False     12        4    None   None   OP  43.382795    52     2023  2023

[8760 rows x 15 columns]
{'name': dtype('O'), 'as_of': dtype('<M8[ns]'), 'time': dtype('<M8[ns]'), 'day': dtype('int64'), 'hour': dtype('int64'), 'is_peak': dtype('bool'), 'month': dtype('int64'), 'quarter': dtype('int64'), 'se_year': dtype('O'), 'season': dtype('O'), 'type': dtype('O'), 'value': dtype('float64'), 'week': dtype('int64'), 'wk_year': dtype('int64'), 'year': dtype('int64')}

Second df (df2) :

                    time  value
0    2023-06-30 22:00:00    5.0
1    2023-06-30 23:00:00    5.0
2    2023-07-01 00:00:00    5.0
3    2023-07-01 01:00:00    5.0
4    2023-07-01 02:00:00    5.0
...                  ...    ...
2203 2023-09-30 17:00:00    5.0
2204 2023-09-30 18:00:00    5.0
2205 2023-09-30 19:00:00    5.0
2206 2023-09-30 20:00:00    5.0
2207 2023-09-30 21:00:00    5.0

[2208 rows x 2 columns]
{'time': dtype('<M8[ns]'), 'value': dtype('float64')}

The result should contain the first df (so 8760 rows x 15 columns) with the 'value' column containing changes from the sum of the values of both dfs.

EDIT 1

So the result should contain the first df (with 8760 x 15 columns) but with modified values for 2208 values corresponding to the existing values in the second df.

df1 ['2023-06-30 22:00:00'] should have the 'value' column final value = df1 ['2023-06-30 22:00:00']['value'] + df2 ['2023-06-30 22:00:00']['value']
...
df1 ['2023-09-30 21:00:00'] should have the 'value' column final value = df1 ['2023-09-30 21:00:00']['value'] + df2 ['2023-09-30 21:00:00']['value']

The rest of the values in df1 won't change because there is no corresponding value in df2.

EDIT 2

The simplified example:

df1
                  time  value
1  2023-01-01 00:00:00    5.0
2  2023-01-01 01:00:00    5.0

df2
                  time      value
0  2022-12-31 23:00:00  56.197097
1  2023-01-01 00:00:00  57.003007
2  2023-01-01 01:00:00  57.134680
3  2023-01-01 02:00:00  56.957425

df_sum_result
                  time      value
0  2022-12-31 23:00:00  56.197097
1  2023-01-01 00:00:00  62.003007
2  2023-01-01 01:00:00  62.134680
3  2023-01-01 02:00:00  56.957425

Concat is not doing the job, the join also not, it's probably a combination of merge and sum.

EDIT 3

Solution is df_op_3 = pd.concat([df1, df2]).groupby(['time']).sum().reset_index() shown in Pandas DataFrame merge summing column.

The result is :

                  time      value
0  2022-12-31 23:00:00  56.197097
1  2023-01-01 00:00:00  62.003007
2  2023-01-01 01:00:00  62.134680
3  2023-01-01 02:00:00  56.957425

Still don't understand why I was pointed in the wrong direction with a link to a wrong solution instead of having a nice and simple answer ...

R13mus
  • 752
  • 11
  • 20
  • 1
    "changes from the sum of the values of both dfs". What exactly does that mean? Can you post a [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and your expected output? – not_speshal Aug 09 '23 at 15:36
  • You need to align your two series. Either set the time as index or `merge` – mozway Aug 09 '23 at 15:37
  • 2
    you should pare this down. Hardcode `df1` with ten rows and three columns. Then hardcode `df2` with 5 rows and two columns. Show us what you tried on those dataframes. Show us what happened. And then hardcode the results that you're trying to get from those simple, minimal dataframes. – Paul H Aug 09 '23 at 15:38
  • why closing my question with a redirect to another answer where you have examples which do not correspond to mine ? if it's that simple why not answering directly my question ?! – R13mus Aug 09 '23 at 15:54
  • Found the solution in another post - `df_op_3 = pd.concat([df_op_1, df_op_2]).groupby(['time']).sum().reset_index()` - https://stackoverflow.com/questions/23361218/pandas-dataframe-merge-summing-column. Big thanks though to you @PaulH for helping me with my methodology ! – R13mus Aug 10 '23 at 14:56

0 Answers0