My data frame is on an hourly basis (index of my df) and I want to predict y.
> df.head()
Date y
2019-10-03 00:00:00 343
2019-10-03 01:00:00 101
2019-10-03 02:00:00 70
2019-10-03 03:00:00 67
2019-10-03 04:00:00 122
I will now import the libraries and train the model:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
prediction_hours = 24
df_train= df[:len(df)-prediction_hours]
df_test= df[len(df)-prediction_hours:]
print(df_train.head())
print('/////////////////////////////////////////')
print (df_test.head())
training_set = df_train.values
training_set = min_max_scaler.fit_transform(training_set)
x_train = training_set[0:len(training_set)-1]
y_train = training_set[1:len(training_set)]
x_train = np.reshape(x_train, (len(x_train), 1, 1))
num_units = 2
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 10
num_epochs = 100
regressor = Sequential()
regressor.add(LSTM(units = num_units, activation = activation_function, input_shape=(None, 1)))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = optimizer, loss = loss_function)
regressor.fit(x_train, y_train, batch_size = batch_size, epochs = num_epochs)
And after training, I can actually use it on my test data:
test_set = df_test.values
inputs = np.reshape(test_set, (len(test_set), 1))
inputs = min_max_scaler.transform(inputs)
inputs = np.reshape(inputs, (len(inputs), 1, 1))
predicted_y = regressor.predict(inputs)
predicted_y = min_max_scaler.inverse_transform(predicted_y)
This is the prediction I got:
The forecast is actually pretty good: is it too good to be true? Am I doing anything wrong? I followed the implementation step by step from a GitHub implementation.
I want to add some exogenous variables, namely v1, v2, v3. If my dataset now looks like this with new variables,
df.head()
Date y v1 v2 v3
2019-10-03 00:00:00 343 4 6 10
2019-10-03 01:00:00 101 3 2 24
2019-10-03 02:00:00 70 0 0 50
2019-10-03 03:00:00 67 0 4 54
2019-10-03 04:00:00 122 3 3 23
How can I include these variables v1,v2 and v3 in my LSTM model? The implementation of the multivariate LSTM is very confusing to me.
Edit to answer Yoan suggestion:
For a dataframe with the date as index and with the columns y, v1, v2 and v3, I've done the following as suggested:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
prediction_hours = 24
df_train= df[:len(df)-prediction_hours]
df_test= df[len(df)-prediction_hours:]
print(df_train.head())
print('/////////////////////////////////////////')
print (df_test.head())
training_set = df_train.values
training_set = min_max_scaler.fit_transform(training_set)
x_train = np.reshape(x_train, (len(x_train), 1, 4))
y_train = training_set[0:len(training_set),1] #I've tried with 0:len.. and
#for 1:len..
num_units = 2
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 10
num_epochs = 100
regressor = Sequential()
regressor.add(LSTM(units = num_units, activation = activation_function,
input_shape=(None, 1,4)))
regressor.add(Dense(units = 1))
regressor.compile(optimizer = optimizer, loss = loss_function)
regressor.fit(x_train, y_train, batch_size = batch_size, epochs =
num_epochs)
But I get the following error:
only integer scalar arrays can be converted to a scalar index