Assuming that df
is my forecast result and actual value, the period from September 2021 to February 2022 is the backtest part, and the period after March 2022 is the forecast part:
date pred actual
0 2021-9-30 14.88 27.70
1 2021-10-31 6.59 26.62
2 2021-11-30 5.88 21.49
3 2021-12-31 7.29 20.58
4 2022-1-31 9.79 24.00
5 2022-2-28 14.74 6.10
6 2022-3-31 9.47 NaN
7 2022-4-30 7.85 NaN
8 2022-5-31 4.81 NaN
9 2022-6-30 3.49 NaN
Now I want to correct pred
column according to actual
column of the backtest data, so that the predicted value in the last period of the backtest period is as close as possible to the actual value. How can I solve this problem? Thank you very much.
For example, we can subtract 8.64 from pred
column (14.74-6.10=8.64), or by building a polynomial regression of pred
and actual
column.
One possible expected output:
last_valid_id = df['actual'].notna()[::-1].idxmax()
last_valid_row = df.loc[df.index == last_valid_id]
gap = last_valid_row['pred'] - last_valid_row['actual']
df['ajd_pred'] = df['pred'] - gap.values
df
Out:
date pred actual adj_pred
0 2021-9-30 14.88 27.70 6.242131
1 2021-10-31 6.59 26.62 -2.049860
2 2021-11-30 5.88 21.49 -2.756469
3 2021-12-31 7.29 20.58 -1.345215
4 2022-1-31 9.79 24.00 1.152847
5 2022-2-28 14.74 6.10 6.099557
6 2022-3-31 9.47 NaN 0.834391
7 2022-4-30 7.85 NaN -0.792580
8 2022-5-31 4.81 NaN -3.826918
9 2022-6-30 3.49 NaN -5.150675
Reference: