1

I have this data

X_regression = tf.range(0, 1000, 5)
y_regression = X + 100

X_reg_train, X_reg_test = X_regression[:150], X_regression[150:]
y_reg_train, y_reg_test = y_regression[:150], y_regression[150:]

I inspect the data input data

X_reg_train[0], X_reg_train[0].shape, X_reg_train[0].ndim

and it returns:

(<tf.Tensor: shape=(), dtype=int32, numpy=0>, TensorShape([]), 0)

I build a model:

# Set the random seed
tf.random.set_seed(42)

# Create the model
model_reg = tf.keras.models.Sequential()

# Add Input layer
model_reg.add(tf.keras.layers.InputLayer(input_shape=[1]))

# Add Hidden layers
model_reg.add(tf.keras.layers.Dense(units=10, activation=tf.keras.activations.relu))

# Add last layer
model_reg.add(tf.keras.layers.Dense(units=1))

# Compile the model
model_reg.compile(optimizer=tf.keras.optimizers.Adam(),
                  loss=tf.keras.losses.mae,
                  metrics=[tf.keras.metrics.mae])

# Fit the model
model_reg.fit(X_reg_train, y_reg_train, epochs=10)

The model works.

However, I am confused about input_shape

Why is it [1] in this situation? Why is it sometimes a tuple?

Would appreciate an explanation of different formats of input_shape in different situations.

Amin Ba
  • 1,603
  • 1
  • 13
  • 38

2 Answers2

1

In Keras, the input layer itself is not a layer, but a tensor. It's the starting tensor you send to the first hidden layer. This tensor must have the same shape as your training data.

Example: if you have 30 images of 50x50 pixels in RGB (3 channels), the shape of your input data is (30,50,50,3). Then your input layer tensor, must have this shape (see details in the "shapes in keras" section).

Each type of layer requires the input with a certain number of dimensions:

  • Dense layers require inputs as (batch_size, input_size) or (batch_size, optional,...,optional, input_size) or in your case just (input_size)

  • 2D convolutional layers need inputs as:

    • if using channels_last: (batch_size, imageside1, imageside2, channels)
    • if using channels_first: (batch_size, channels, imageside1, imageside2)
  • 1D convolutions and recurrent layers use (batch_size, sequence_length, features)

Here are some helpful links : Keras input explanation: input_shape, units, batch_size, dim, etc https://keras.io/api/layers/core_layers/input/

Atharva Gundawar
  • 475
  • 3
  • 10
  • I understand your example and it is helpful. However, for the case of my question, the input data has no shape but I see an error when I test `model_reg.add(tf.keras.layers.InputLayer(input_shape=())` – Amin Ba Jun 29 '21 at 03:00
  • @AminBa it does have a shape which is `(150,)` or `[150]` try printing `X_reg_train.shape` or `X_reg_train`. – yudhiesh Jun 29 '21 at 03:03
  • @yudhiesh The input is `X_reg_train[0]` and not `X_reg_train`. I am sending the data one by one. – Amin Ba Jun 29 '21 at 03:04
  • I just checked. I see an error if I use `input_shape=(150,)` ValueError: `Input 0 of layer sequential_60 is incompatible with the layer: expected axis -1 of input shape to have value 150 but received input with shape (None, 1)` – Amin Ba Jun 29 '21 at 03:06
  • @Amin Ba: I also got the same error with different data; I omitted the `input_shape` and just inserted `input_dim` and set it equal to the number of input variables or predictors (e.g., for 10 predictors you can set `input_dim=10` ). BTW, I know I'm late to answer, please let us know how you solved your case. Thanks – Maryam Nasseri Mar 31 '23 at 13:26
1

InputLayer is actually just the same as specifying the parameter input_shape in a Dense layer. Keras actually uses InputLayer when you use method 2 in the background.

# Method 1
model_reg.add(tf.keras.layers.InputLayer(input_shape=(1,)))
model_reg.add(tf.keras.layers.Dense(units=10, activation=tf.keras.activations.relu))

# Method 2
model_reg.add(tf.keras.layers.Dense(units=10, input_shape=(1,), activation=tf.keras.activations.relu))

The parameter input_shape is actually supposed to be a tuple, if you noticed that I set the input_shape in your example to be (1,) this is a tuple with a single element in it. As your data is 1D, you pass in a single element at a time therefore the input shape is (1,).

If your input data was a 2D input for example when trying to predict the price of a house based on multiple variables, you would have multiple rows and multiple columns of data. In this case, you pass in the input shape of the last dimension of the X_reg_train which is the number of inputs. If X_reg_train was (1000,10) then we use the input_shape of (10,).

model_reg.add(tf.keras.layers.Dense(units=10, input_shape=(X_reg_train.shape[1],), activation=tf.keras.activations.relu))

Ignoring the batch_size for a moment, with this we are actually just sending a single row of the data to predict a single house price. The batch_size is just here to chunk multiple rows of data together so that we do not have to load the entire dataset into memory which is computationally expensive, so we send small chunks, with the default value being 32. When running the training you would have noticed that under each epoch it says 5/5 which are for the 5 batches of data you have, since the training size is 150, 150 / 32 = 5(rounded up).

For 3D input with the Dense layer it actually just gets flattened to a 2D input, i.e. from (batch_size, sequence_length, dim) -> (batch_size * sequence_length, dim) -> (batch_size, sequence_length, hidden_units) which is the same as using a Conv1D layer with a kernel of 1. So I wouldn't even use the Dense layer in this case.

yudhiesh
  • 6,383
  • 3
  • 16
  • 49