Input of layer sequential is incompatible with the layer: shapes error in LSTM

Question

I'm new to neural networks and I want to use them to compare with other machine learning methods. I have a multivariate time series data with a range of approximately two years. I want to predict 'y' for the next few days based on the other variables using LSTM. The final day of my data is 2020-07-31.

df.tail()

              y   holidays  day_of_month    day_of_week month   quarter
   Date                     
 2020-07-27 32500      0      27                 0        7        3
 2020-07-28 33280      0      28                 1        7        3
 2020-07-29 31110      0      29                 2        7        3
 2020-07-30 37720      0      30                 3        7        3
 2020-07-31 32240      0      31                 4        7        3

To train the LSTM model I also split the data into train and test data.

from sklearn.model_selection import train_test_split
split_date = '2020-07-27' #to predict the next 4 days
df_train = df.loc[df.index <= split_date].copy()
df_test = df.loc[df.index > split_date].copy()
X1=df_train[['day_of_month','day_of_week','month','quarter','holidays']]
y1=df_train['y']
X2=df_test[['day_of_month','day_of_week','month','quarter','holidays']]
y2=df_test['y']

X_train, y_train =X1, y1
X_test, y_test = X2,y2

Because I'm working with LSTM, some scaling is needed:

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Now, onto the difficult part: the model.

num_units=50
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 10
num_epochs = 100

 # Initialize the RNN
regressor = Sequential()

 # Adding the input layer and the LSTM layer
regressor.add(LSTM(units = num_units, return_sequences=True ,activation = activation_function, 
input_shape=(X_train.shape[1], 1)))

 # Adding the output layer
regressor.add(Dense(units = 1))

 # Compiling the RNN
regressor.compile(optimizer = optimizer, loss = loss_function)

# Using the training set to train the model
regressor.fit(X_train_scaled, y_train, batch_size = batch_size, epochs = num_epochs)

However, I receive the following error:

ValueError: Input 0 of layer sequential_11 is incompatible with the layer: expected ndim=3, found 
ndim=2. Full shape received: [None, 5]

I don't understand how we choose the parameters or the shape of the input. I've seen some videos and read some Github pages and everyone seems to run LSTM in a different way, which makes it even more difficult to implement. The previous error is probably coming from the shape but other than that is everything else right? And how can I fix this to work? Thanks

EDIT: This similar question does not solve my problem.. I've tried the solution from there

x_train = X_train_scaled.reshape(-1, 1, 5)
x_test  = X_test_scaled.reshape(-1, 1, 5)

(My X_test and y_test only have one column). And the solution also doesn't seem to work. I get this error now:

ValueError: Input 0 is incompatible with layer sequential_22: expected shape= 
(None, None, 1), found shape=[None, 1, 5]

This error is happening because you define a model architecture, but then your input does not fit the architecture. As a rule of thumb as a programmer, you really have to pay attention to the error messages, and it says `Full shape received: [None, 5]`. This is happening because your input is of a shape `[None, 5]`, given that `X1=df_train[['day_of_month','day_of_week','month','quarter','holidays']]`. Try defining an input layer before defining a sequential layer as such `layers.Input(shape=(len(X1.columns),))`. — xicocaio, Dec 22 '20 at 12:28
Does this answer your question? [expected ndim=3, found ndim=2](https://stackoverflow.com/questions/54416322/expected-ndim-3-found-ndim-2) — Nicolas Gervais, Dec 22 '20 at 13:03
I do not know how much of this makes sense but instead of `x_train = X_train_scaled.reshape(-1, 1, 5) x_test = X_test_scaled.reshape(-1, 1, 5)`, you can do `x_train = X_train_scaled.reshape(-1, 5, 1) x_test = X_test_scaled.reshape(-1, 5, 1)` to make it work — learner, Dec 24 '20 at 13:29

score 2 · Answer 1 · answered Dec 22 '20 at 14:18

INPUT:

The problem is that you model expect a 3D input of shape (batch, sequence, features) but your X_train is actually a slice of data frame, so a 2D array :

X1=df_train[['day_of_month','day_of_week','month','quarter','holidays']]
X_train, y_train =X1, y1

I assume your columns are supposed to be you features, so what you would usually do is "stack slices" of your df so that you X_train look something like that :

Here is a dummy 2D data set of shape (15,5) :

data = np.zeros((15,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

You can reshape it to add a batch dimension, for example (15,1,5):

data = data[:,np.newaxis,:] 

array([[[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]]])

Same data, but presented in a different way. Now in this example, batch = 15 and sequence = 1, I don't know what is the sequence length in your case but it can be anything.

MODEL :

Now in your model, keras input_shape expect (batch, sequence, features), when you pass this :

input_shape=(X_train.shape[1], 1)

This is what you model sees : (None, Sequence = X_train.shape[1] , num_features = 1) None is for the batch dimension. I don't think that's what your are trying to do so once you've reshaped you should also correct input_shape to match the new array.

Thanks for the answer. But it's kinda confusing. The purpose of giving the code scripts when making a question is for the answers to be more simple and adaptive to the script. You just generated an array of zeros. Not very intuitive tbh. — Numbermind, Dec 22 '20 at 17:53
@AmateurMathematician, I don't have access to your data to demonstrate, so the best I can do is offer some suggestion to correct some of the misconception in your code. Let me know which part requires more clarification. — Yoan B. M.Sc, Dec 22 '20 at 20:40

score 1 · Answer 2 · answered Dec 24 '20 at 14:27

It is a multivariate regression problem you are solving using LSTM. Before jumping into the code lets actually see what it means

Problem statement:

You have 5 feature holidays, day_of_month, day_of_week,month,quarter per day for k days
For any day n, given the features of say last 'm' days you want to predict the y of the nth day

Creating window dataset:

We fist need to decide on the number of days we want to feed to our model. This is called the sequence length (lets fix it to 3 for this example).
We have to split the days of sequence length to create the train and test dataset. This is done by using a sliding window where the window size is the sequence lenght.
As you can see there are no predictions available by last p records where p is the sequence length.
We will do the window dataset creations using timeseries_dataset_from_array method.
For more advance stuff follow official tf docs.

LSTM Model

So pictorial what we want to achieve is show below:

For each LSTM cell unrolling, we pass in the 5 features of the day, and we unroll in m time where m is the sequence length. We are predicting the y of the last day.

Code:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Model
regressor =  models.Sequential()
regressor.add(layers.LSTM(5, return_sequences=True))
regressor.add(layers.Dense(1))
regressor.compile(optimizer='sgd', loss='mse')

# Dummy data
n = 10000
df = pd.DataFrame(
    {
      'y': np.arange(n),
      'holidays': np.random.randn(n),
      'day_of_month': np.random.randn(n),
      'day_of_week': np.random.randn(n),
      'month': np.random.randn(n),
      'quarter': np.random.randn(n),     
    }
)

# Train test split
train_df, test_df = train_test_split(df)
print (train_df.shape, test_df.shape)\

# Create y to be predicted 
# given last n days predict todays y

# train data
sequence_length = 3
y_pred = train_df['y'][sequence_length-1:].values
train_df = train_df[:-2]
train_df['y_pred'] = y_pred

# Validataion data
y_pred = test_df['y'][sequence_length-1:].values
test_df = test_df[:-2]
test_df['y_pred'] = y_pred

# Create window datagenerators

# Train data generator
train_X = train_df[['holidays','day_of_month','day_of_week','month','month']]
train_y = train_df['y_pred']
train_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
    train_X, train_y, sequence_length=sequence_length, shuffle=True, batch_size=4)

# Validation data generator
test_X = test_df[['holidays','day_of_month','day_of_week','month','month']]
test_y = test_df['y_pred']
test_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
    test_X, test_y, sequence_length=sequence_length, shuffle=True, batch_size=4)

# Finally fit the model
regressor.fit(train_dataset, validation_data=test_dataset, epochs=3)

Output:

(7500, 6) (2500, 6)
Epoch 1/3
1874/1874 [==============================] - 8s 3ms/step - loss: 9974697.3664 - val_loss: 8242597.5000
Epoch 2/3
1874/1874 [==============================] - 6s 3ms/step - loss: 8367530.7117 - val_loss: 8256667.0000
Epoch 3/3
1874/1874 [==============================] - 6s 3ms/step - loss: 8379048.3237 - val_loss: 8233981.5000
<tensorflow.python.keras.callbacks.History at 0x7f3e94bdd198>

Thanks, however, I don't understand why I can't use regressor.predict(test_y) in your example, only works to test_dataset. I only want to predict y (because I'll know beforehand the other variables for the future). Also, how can I plot the results (prediction and real) following your code? — Numbermind, Dec 27 '20 at 10:29

Input of layer sequential is incompatible with the layer: shapes error in LSTM

2 Answers2

Problem statement:

Creating window dataset:

LSTM Model

Code: