I am working on a project to try to enhance my understanding of LSTM networks. I am following the steps outlined in this blog post here. My dataset looks like the following:
Open High Low Close Volume
Date
2014-04-21 197.080002 206.199997 194.000000
204.380005 5258200
2014-04-22 206.360001 219.330002 205.009995
218.639999 9804700
2014-04-23 216.330002 216.740005 207.000000
207.990005 7295600
2014-04-24 210.809998 212.800003 203.199997
207.860001 5495200
2014-04-25 202.000000 206.699997 197.649994
199.850006 6996700
As you can see this is a small snapshot of TSLA Stock movement.
I understand that with LSTM, this data needs to be reshaped into three dimensions:
Batch Size
Time Steps
Features
My initial idea was to use some sort of medium batch size (to allow for the best generalization). Also, to look back at 10 days of history as the Time Step. Features as Open, High, Low, Volume, Close.
Here is where I am a bit stuck. I have two questions specifically:
What is the approach for breaking the data into the new representation (transforming it)?
How do we take this and split it into the train, test, and validation sets? I am having trouble conceptualizing exactly what is being broken down. My initial thought was to use sklearn:
train_test_split()
But this does not seem like it will work in this case.
Obviously, once the data has been transformed and then split it is easy building the Keras model. It is just a matter of calling fit.(data).
Any suggestions or resources (pointing in the right direction) would be greatly appreciated.
My current code is:
from sklearn.model_selection import train_test_split
# Split the Data into Training and Testing Data
tsla_train, tsla_test = train_test_split(tsla)
tsla_train.shape
tsla_test.shape
from sklearn.preprocessing import MinMaxScaler
# Scale the Data
scaler = MinMaxScaler()
scaler.fit(tsla_train)
tsla_train_scaled = scaler.transform(tsla_train)
tsla_test_scaled = scaler.transform(tsla_test)
# Define the parameters of the model
batch_size = 20
# Set the model to look back on four days of historical data and
try to predict the fifth
time_steps = 10
from keras.models import Sequential
from keras.layers import LSTM, Dense
lstm_model = Sequential()
There is some explanation found in this post here.