0

Assuming that df is my forecast result and actual value, the period from September 2021 to February 2022 is the backtest part, and the period after March 2022 is the forecast part:

         date   pred  actual
0   2021-9-30  14.88   27.70
1  2021-10-31   6.59   26.62
2  2021-11-30   5.88   21.49
3  2021-12-31   7.29   20.58
4   2022-1-31   9.79   24.00
5   2022-2-28  14.74    6.10
6   2022-3-31   9.47     NaN
7   2022-4-30   7.85     NaN
8   2022-5-31   4.81     NaN
9   2022-6-30   3.49     NaN

Now I want to correct pred column according to actual column of the backtest data, so that the predicted value in the last period of the backtest period is as close as possible to the actual value. How can I solve this problem? Thank you very much.

For example, we can subtract 8.64 from pred column (14.74-6.10=8.64), or by building a polynomial regression of pred and actual column.

One possible expected output:

last_valid_id = df['actual'].notna()[::-1].idxmax()
last_valid_row = df.loc[df.index == last_valid_id]
gap = last_valid_row['pred'] - last_valid_row['actual']
df['ajd_pred'] = df['pred'] - gap.values
df

Out:

         date   pred  actual  adj_pred
0   2021-9-30  14.88   27.70  6.242131
1  2021-10-31   6.59   26.62 -2.049860
2  2021-11-30   5.88   21.49 -2.756469
3  2021-12-31   7.29   20.58 -1.345215
4   2022-1-31   9.79   24.00  1.152847
5   2022-2-28  14.74    6.10  6.099557
6   2022-3-31   9.47     NaN  0.834391
7   2022-4-30   7.85     NaN -0.792580
8   2022-5-31   4.81     NaN -3.826918
9   2022-6-30   3.49     NaN -5.150675

enter image description here

Reference:

Locate first and last non NaN values in a Pandas DataFrame

ah bon
  • 9,293
  • 12
  • 65
  • 148

0 Answers0