keras (lstm) - necessary shape when using return_sequences=True

Question

I am trying to fit an LSTM network to a sin function. Currently, as far as I understand Keras, my code does only predict the next value. According to this link: Many to one and many to many LSTM examples in Keras it is a many to one model. However, my goal is to implement a Many-to-many model. Basically, I want to be able to predict let's say 10 values, to a given time. When I am trying to use return_sequences=True (see line model.add(..)), which is supposed to be the solution, the following error occurs:

ValueError: Error when checking target: expected lstm_8 to have 3 dimensions, but got array with shape (689, 1)

Unfortunately, I have absolutely no clue why this happens. Is there a general rule how the input shape needs to be when using return_sequences=True ? Furthermore what exactly would I need to change? Thanks for any help.

import pandas
import numpy as np
import matplotlib.pylab as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import sklearn

from keras.models import Sequential
from keras.layers import Activation, LSTM
from keras import optimizers
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

#generate sin function with noise
x = np.arange(0, 100, 0.1)
noise = np.random.uniform(-0.1, 0.1, size=(1000,))
Y = np.sin(x) + noise

# Perform feature scaling
scaler = MinMaxScaler()
Y = scaler.fit_transform(Y.reshape(-1, 1))

# split in train and test
train_size = int(len(Y) * 0.7)
test_size = len(Y) - train_size
train, test = Y[0:train_size,:], Y[train_size:len(Y),:]

def create_dataset(dataset, look_back=1):
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
         a = dataset[i:(i+look_back), 0]
         dataX.append(a)
         dataY.append(dataset[i + look_back, 0])
    return np.array(dataX), np.array(dataY)

# reshape into X=t and Y=t+1
look_back = 10
X_train, y_train = create_dataset(train, look_back)
X_test, y_test = create_dataset(test, look_back)

# LSTM network expects the input data in form of [samples, time steps, features]
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
np.set_printoptions(threshold=np.inf)

# compile model
model = Sequential()
model.add(LSTM(1, input_shape=(look_back, 1)))#, return_sequences=True))  <== uncomment this
model.compile(loss='mean_squared_error', optimizer='adam')
SVG(model_to_dot(model).create(prog='dot', format='svg'))

model.fit(X_train, y_train, validation_data=(X_test, y_test), 
batch_size=10, epochs=10, verbose=2)
prediction = model.predict(X_test, batch_size=1, verbose=0)
prediction.reshape(-1) 
#Transform back to original representation
Y = scaler.inverse_transform(Y)
prediction = scaler.inverse_transform(prediction)
plt.plot(np.arange(0,Y.shape[0]), Y)
plt.plot(np.arange(Y.shape[0] - X_test.shape[0] , Y.shape[0]), prediction, 'red')
plt.show()
error = mean_squared_error(y_test, prediction)
print(error)

score 3 · Answer 1 · answered Nov 13 '17 at 11:15

3

The problem is not the input, but the output. The error says: "Error when checking target", target = y_train and y_test.

Because your lstm returns a sequence (return_sequences=True) the output dimention will be: (n_batch,lookback,1).

You can verify it by using model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10, 1)             12        
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________

You will need to change your create_dataset function so each ground truth will be shaped (lookback,1).

Something you might want to do:
for each seqeuence x in the train set,its y will be the next proceedings sequence.
For example, lets say we would like to learn something easier, the seqeuence will be the previous number plus 1 --> 1,2,3,4,5,6,7,8,9,10. For loockback=4:

X_train[0] = 1,2,3,4   
y_train[0] will be: 2,3,4,5  
X_train[1] = 2,3,4,5  
y_train[1] will be: 3,4,5,6  
and so on...

answered Nov 13 '17 at 11:15

Dvir Samuel

1,160
2
8
19

Thanks for your reply but I really don't get it. Changing the line to `dataY.append(dataset[i+1:(i+1+look_back), 0])` would produce the output you mentioned? – Dennis Nov 24 '17 at 11:53
It depends on the desired output.. but yes.. dataY.append(dataset[i+1:(i+1+look_back), 0]) should make each label be (loockback,1) vector. – Dvir Samuel Nov 24 '17 at 11:58
That's what I thought, but it seems to make no difference. At least I am getting a value error as well. `Error when checking target: expected lstm_1 to have 3 dimensions, but got array with shape (164, 10)` – Dennis Nov 25 '17 at 11:48
You did well.. The last thing is that the target shape should be (None,164,10), the instruction dataY.append(dataset[i+1:(i+1+look_back), 0]) will create vector with shape (164,10) but you want to notify the model that each timestamp is scalar. So just do after that: dataY.reshape(-1,lookback,1) or np.expand_dims(dataY,axis=-1) and make sure that dataY.shape is (164,10,1) – Dvir Samuel Nov 25 '17 at 15:31
I know this is an old question but I have simulated the data as suggested by @DvirSamuel. I'll post the code as an answer, since I don't have the space here. Note that a FNN performs as well as the LSTM. – from keras import michael Aug 27 '18 at 02:57

from keras import michael · Answer 2 · 2018-08-27T03:17:51.997

I have simulated the data as @DvirSamuel suggested, and provided code for a LSTM and FNN. Note that for the LSTM, network_lstm.add(layers.Dense(1, activation = None)) is required if return_sequences = True is included in the previous layer.

## Simulate data.

np.random.seed(20180826)

Z = np.random.randint(0, 10, size = (11000, 1))

for i in range(10):

     Z = np.concatenate((Z, (Z[:, -1].reshape(Z.shape[0], 1) + 1)), axis = 1)

X = Z[:, :-1]

Y = Z[:,  1:]

print(X.shape)

print(Y.shape)

## Training and validation data.

split = 10000

X_train = X[:split, :]
X_valid = X[split:, :]

Y_train = Y[:split, :]
Y_valid = Y[split:, :]

print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)

Code for a LSTM model:

## LSTM model.

X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)

Y_lstm_train = Y_train.reshape(Y_train.shape[0], Y_train.shape[1], 1)
Y_lstm_valid = Y_valid.reshape(Y_valid.shape[0], Y_valid.shape[1], 1)

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1),
    return_sequences = True))
network_lstm.add(layers.Dense(1, activation = None))

network_lstm.summary()

# Compile model.

network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm_train, Y_lstm_train, epochs = 5, batch_size = 32, verbose = True,
    validation_data = (X_lstm_valid, Y_lstm_valid))

## Extract loss over epochs and predict.

# Extract loss.

loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)

plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()

plt.title('First in Sequence')

plt.scatter(Y_train[:, 0], network_lstm.predict(X_lstm_train)[:, 0], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid[:, 0], network_lstm.predict(X_lstm_valid)[:, 0], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.title('Last in Sequence')

plt.scatter(Y_train[:, -1], network_lstm.predict(X_lstm_train)[:, -1], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid[:, -1], network_lstm.predict(X_lstm_valid)[:, -1], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

Code for a FNN model:

## FNN model.

# Define model.

network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(10, activation = None))

network_fnn.summary()

# Compile model.

network_fnn.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')

# Fit model.

history_fnn = network_fnn.fit(X_train, Y_train, epochs = 5, batch_size = 32, verbose = True,
    validation_data = (X_valid, Y_valid))

## Extract loss over epochs.

# Extract loss.

loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)

plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()

plt.title('First in Sequence')

plt.scatter(Y_train[:, 1], network_fnn.predict(X_train)[:, 1], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid[:, 1], network_fnn.predict(X_valid)[:, 1], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.title('Last in Sequence')

plt.scatter(Y_train[:, -1], network_fnn.predict(X_train)[:, -1], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

plt.scatter(Y_valid[:, -1], network_fnn.predict(X_valid)[:, -1], alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

Great! :) Does it wotk? Did you train the model? Can the model learn the logic? — Dvir Samuel, Aug 28 '18 at 12:55
Yep! It works for me, anyway. Give it a try @DvirSamuel, the code is all there. — from keras import michael, Aug 28 '18 at 14:47

score 0 · Answer 3 · answered Dec 06 '18 at 18:06

Shouldn't this:

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

be like this:

X_train = np.reshape((X_train.shape[0], X_train.shape[1], 1))

X_test = np.reshape((X_test.shape[0], X_test.shape[1], 1))

Can this be your problem? (1 years after xD)

keras (lstm) - necessary shape when using return_sequences=True

3 Answers3

Linked