I'm learning about time series and am trying to predict closing stock price for the next two weeks, given the data I already have (about a year).
I've created 7 lag features using Pandas shift
, so I have features t-7, t-6, ..., t-1
and the current day's closing stock price for my whole DataFrame, df
. I've made a test_df
which is just the last two weeks of data. test_df
has the true values for each of its row's lagged features.
I want to mimic predicting future values by limiting myself to values from my training set (everything in df
before the last two weeks) and my predictions.
So I was going to do something like:
# for each row in test_df
# prediction = model.predict(row)
# row["t"] = prediction
I think this is close, but it doesn't fix other lagged features like t-1, t-2, ..., t-7
. I need to do this:
row 2, t = prediction for row 1
row 2, t-1 = t for row 1
...
row 2, t-i = t-i+1 for row 1
And I would repeat this for all rows in my test_df
.
I could do this by writing my own function, but I'm wondering if there's a way to take advantage of Pandas to do this more easily.
Edit for clarity:
Suppose I'm looking at my first test row. I don't have the closing_price
, so I use my model to predict based on the lagged features. Before prediction, my df
looks like this:
closing_price t-1 t-2 t-3 t-4 t-5
0 None 7 6 5 4 3
Suppose my prediction for closing_price
is 15. Then my updated DataFrame should look like this:
closing_price t-1 t-2 t-3 t-4 t-5
0 15.0 7.0 6.0 5.0 4.0 3.0
1 NaN 15.0 7.0 6.0 5.0 4.0
Thanks!