I want to create a simple LSTM which will try to predict stock prices based on 3 variables open_price, close_price and volume.
- I want to predict the open_price and close_price only
- I want to predict the prices for the next 5 days
- I want to predict the prices based on 10 previous days
- I believe that tomorrow's prices will be affected by both today's prices and voulme
I've been playing with this example, which uses just one variable (price). But I can't figure out how to make it work with multiple variables.
Please remember that while I'm only predicting prices from the past, I still want my network to use volume data from the past, as it might carry some information about prices changes.
Here is my code using some random values:
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import numpy as np
my_input_dim = ???
my_output_dim = ???
model = Sequential()
model.add(LSTM(
input_dim=my_input_dim,
output_dim=my_output_dim,
return_sequences=True))
model.compile(loss='mse', optimizer='adam')
rows = 100
# index 0: open_price, index 1: close_price, index 2: volume
dataset = np.random.rand(rows,3)
my_x_train = ???
my_y_train = ???
model.fit(
my_x_train,
my_y_train,
batch_size=1,
nb_epoch=10,
validation_split=0.20)
data_from_5_previous_days = np.random.rand(5,3)
# should be a 2D array of lenght 5 (prices for the next 5 days)
prediction = model.predict(data_from_5_previous_days)
Could you please help me finish it?
EDIT
I think I made some progress here:
timesteps = 5
out_timesteps = 5
input_features = 4
output_features = 3
xs = np.arange(20 * timesteps * input_features).reshape((20,timesteps,input_features)) # (20,5,4)
ys = np.arange(20 * out_timesteps * output_features).reshape((20,out_timesteps,output_features)) # (20,5,3)
model = Sequential()
model.add(LSTM(13, input_shape=(timesteps, input_features), return_sequences=True))
model.add(LSTM(17, return_sequences=True))
model.add(LSTM(output_features, return_sequences=True)) # output_features (output dim) needs to be the same length as the length of a sample from ys (int the LAST layer)
model.compile(loss='mse', optimizer='adam')
model.fit(xs,ys,batch_size=1,nb_epoch=1,validation_split=0.20)
prediction = model.predict(xs[0:2]) # (20,5,3)
So my single sample is
- input: a sequence (5 items long), and every item has 4 features
- output: a sequence (5 items long), and every item has 3 features
However I'm unable to change the output sequence length (out_timesteps = 2
). I get an error:
ValueError: Error when checking target: expected lstm_143 to have shape (None, 5, 3) but got array with shape (20, 2, 3)
I think about is like that: based on last 5 days, predict next 2 days.
What can I do with that error?