How can I sum up 2 Pandas Dataframes on a 'value' column ? They have different sizes, but the time column is unique for both.
First df (df1) :
name as_of time day hour is_peak month quarter se_year season type value week wk_year year
0 LU_STC 2022-09-30 2022-12-31 23:00:00 1 0 False 1 1 None None OP 56.197097 52 2022 2023
1 LU_STC 2022-09-30 2023-01-01 00:00:00 1 1 False 1 1 None None OP 57.003007 52 2022 2023
2 LU_STC 2022-09-30 2023-01-01 01:00:00 1 2 False 1 1 None None OP 57.134680 52 2022 2023
3 LU_STC 2022-09-30 2023-01-01 02:00:00 1 3 False 1 1 None None OP 56.957425 52 2022 2023
4 LU_STC 2022-09-30 2023-01-01 03:00:00 1 4 False 1 1 None None OP 55.794171 52 2022 2023
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8755 LU_STC 2022-09-30 2023-12-31 18:00:00 31 19 False 12 4 None None OP 32.253445 52 2023 2023
8756 LU_STC 2022-09-30 2023-12-31 19:00:00 31 20 False 12 4 None None OP 35.600396 52 2023 2023
8757 LU_STC 2022-09-30 2023-12-31 20:00:00 31 21 False 12 4 None None OP 38.478395 52 2023 2023
8758 LU_STC 2022-09-30 2023-12-31 21:00:00 31 22 False 12 4 None None OP 41.564846 52 2023 2023
8759 LU_STC 2022-09-30 2023-12-31 22:00:00 31 23 False 12 4 None None OP 43.382795 52 2023 2023
[8760 rows x 15 columns]
{'name': dtype('O'), 'as_of': dtype('<M8[ns]'), 'time': dtype('<M8[ns]'), 'day': dtype('int64'), 'hour': dtype('int64'), 'is_peak': dtype('bool'), 'month': dtype('int64'), 'quarter': dtype('int64'), 'se_year': dtype('O'), 'season': dtype('O'), 'type': dtype('O'), 'value': dtype('float64'), 'week': dtype('int64'), 'wk_year': dtype('int64'), 'year': dtype('int64')}
Second df (df2) :
time value
0 2023-06-30 22:00:00 5.0
1 2023-06-30 23:00:00 5.0
2 2023-07-01 00:00:00 5.0
3 2023-07-01 01:00:00 5.0
4 2023-07-01 02:00:00 5.0
... ... ...
2203 2023-09-30 17:00:00 5.0
2204 2023-09-30 18:00:00 5.0
2205 2023-09-30 19:00:00 5.0
2206 2023-09-30 20:00:00 5.0
2207 2023-09-30 21:00:00 5.0
[2208 rows x 2 columns]
{'time': dtype('<M8[ns]'), 'value': dtype('float64')}
The result should contain the first df (so 8760 rows x 15 columns) with the 'value' column containing changes from the sum of the values of both dfs.
EDIT 1
So the result should contain the first df (with 8760 x 15 columns) but with modified values for 2208 values corresponding to the existing values in the second df.
df1 ['2023-06-30 22:00:00'] should have the 'value' column final value = df1 ['2023-06-30 22:00:00']['value'] + df2 ['2023-06-30 22:00:00']['value']
...
df1 ['2023-09-30 21:00:00'] should have the 'value' column final value = df1 ['2023-09-30 21:00:00']['value'] + df2 ['2023-09-30 21:00:00']['value']
The rest of the values in df1 won't change because there is no corresponding value in df2.
EDIT 2
The simplified example:
df1
time value
1 2023-01-01 00:00:00 5.0
2 2023-01-01 01:00:00 5.0
df2
time value
0 2022-12-31 23:00:00 56.197097
1 2023-01-01 00:00:00 57.003007
2 2023-01-01 01:00:00 57.134680
3 2023-01-01 02:00:00 56.957425
df_sum_result
time value
0 2022-12-31 23:00:00 56.197097
1 2023-01-01 00:00:00 62.003007
2 2023-01-01 01:00:00 62.134680
3 2023-01-01 02:00:00 56.957425
Concat is not doing the job, the join also not, it's probably a combination of merge and sum.
EDIT 3
Solution is df_op_3 = pd.concat([df1, df2]).groupby(['time']).sum().reset_index()
shown in Pandas DataFrame merge summing column.
The result is :
time value
0 2022-12-31 23:00:00 56.197097
1 2023-01-01 00:00:00 62.003007
2 2023-01-01 01:00:00 62.134680
3 2023-01-01 02:00:00 56.957425
Still don't understand why I was pointed in the wrong direction with a link to a wrong solution instead of having a nice and simple answer ...