Given the following dataset, I want to process it in order to be able to fit a RNN in Keras with shape (batch_size,timesteps,features). This is a simplified example of the dataset:
X = np.array([[1,2,3,4,5],[7,8,9,10,11],[12,13,14,15,16]]).T
data = pd.DataFrame(X,columns=['feature1','feature2','outcome'])
feature1 feature2 outcome
1 7 12
2 8 13
3 9 14
4 10 15
5 11 16
I now want to create a numpy array that reflects a lag of 2 for outcome. My goal is to predict the outcome, given the values of the previous two timesteps.
That is, I want an array that looks like this.
batch_size = 3 # for this particular dataset
timesteps = 2
features = 2
out = np.empty(shape=(batch_size,timesteps,features))
out[0] = np.array([[1,7],[2,8]])
out[1] = np.array([[2,8],[3,9]])
out[2] = np.array([[3,9],[4,10]])
y = np.array([14,15,16])
print(out)
[[[ 1. 7.]
[ 2. 8.]]
[[ 2. 8.]
[ 3. 9.]]
[[ 3. 9.]
[ 4. 10.]]]
With the outcome represented as:
print(y)
[14 15 16]
As you can see, there are a total of 3 possible combinations (shape[0]), where each combination has 2 lags (shape[1]) and two features (shape[2]).