Difference between the input shape for a 1D CNN, 2D CNN and 3D CNN

Question

I'm first time building a CNN model for image classification and i'm a little bit confused about what would be the input shape for each type (1D CNN, 2D CNN, 3D CNN) and how to fix the number of filters in the convolution layer. My data is 100x100x30 where 30 are features. Here is my essay for the 1D CNN using the Functional API Keras:

def create_CNN1D_model(pool_type='max',conv_activation='relu'):
    input_layer = (30,1)
    conv_layer1 = Conv1D(filters=16, kernel_size=3, activation=conv_activation)(input_layer)
    max_pooling_layer1 = MaxPooling1D(pool_size=2)(conv_layer1)

    conv_layer2 = Conv1D(filters=32, kernel_size=3, activation=conv_activation)(max_pooling_layer1)
    max_pooling_layer2 = MaxPooling1D(pool_size=2)(conv_layer2)

    flatten_layer = Flatten()(max_pooling_layer2)
    dense_layer = Dense(units=64, activation='relu')(flatten_layer)

    output_layer = Dense(units=10, activation='softmax')(dense_layer)
    CNN_model = Model(inputs=input_layer, outputs=output_layer)
    return CNN_model
CNN1D = create_CNN1D_model()
CNN1D.compile(loss = 'categorical_crossentropy', optimizer = "adam",metrics = ['accuracy'])
Trace = CNN1D.fit(X, y, epochs=50, batch_size=100)

However, while trying the 2D CNN model by just changing Conv1D, Maxpooling1D to Conv2D and Maxpooling2D, i got the following error :

ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 1)

Can anyone please tell me how would be the input shape for 2D CNN and 3D CNN ? And what can be done on input data preprocessing?

I think you should look at [this](https://stackoverflow.com/a/44628011/4687256) answer. And the selected answer seems at-least incomplete (or at worst wrong) to me. — x.projekt, Feb 19 '22 at 12:11
The input-dimension should be atleast the dimensionality of the convolution that you desire, i.e. for | 1D-input `(W,)` you only perform 1D-convolution || for 2D-input `(H, W)` you can perform 1D or 2D convolution || for 3D-input `(H, W, D)` you may perform 1D, 2D, or 3D convolution ||. Again check [this](https://stackoverflow.com/a/44628011/4687256) answer for more details. — x.projekt, Feb 19 '22 at 12:25

Akshay Sehgal · Accepted Answer · 2021-02-16T16:28:42.207

TLDR; your X_train can be looked at as (batch, spatial dims..., channels). A kernel applies to the spatial dimensions for all channels in parallel. So a 2D CNN, would require two spatial dimensions (batch, dim 1, dim 2, channels).

So for (100,100,3) shaped images, you will need a 2D CNN that convolves over 100 height and 100 width, over all the 3 channels.

Lets, understand the above statement.

First, you need to understand what CNN (in general) is doing.

A kernel is convolving through the spatial dimensions of a tensor across its feature maps/channels while performing a simple matrix operation (like dot product) to the corresponding values.

Kernel moves over the spatial dimensions

Now, Let's say you have 100 images (called batches). Each image is 28 by 28 pixels and has 3 channels R, G, B (which are also called feature maps in context to CNNs). If I were to store this data as a tensor, the shape would be (100,28,28,3).

However, I could just have an image that doesn't have any height (may like a signal) OR, I could have data that has an extra spatial dimension such as a video (height, width, and time).

In general, here is how the input for a CNN-based neural network looks like.

Same kernel, all channels

The second key point you need to know is, A 2D kernel will convolve over 2 spatial dimensions BUT the same kernel will do this over all the feature maps/channels. So, if I have a (3,3) kernel. This same kernel will get applied over R, G, B channels (in parallel) and move over the Height and Width of the image.

Operation is a dot product

Finally, the operation (for a single feature map/channel and single convolution window) can be visualized like below.

Therefore, in short -

A kernel gets applied to the spatial dimensions of the data
A kernel shape is equal to the # of spatial dimensions
A kernel applies over all the feature maps/channels at once
The operation is a simple dot product between the kernel and window

Let's take the example of tensors with single feature maps/channels (so, for an image, it would be greyscaled) -

So, with that intuition, we see that if I want to use a 1D CNN, your data must have 1 spatial dimension, which means each sample needs to be 2D (spatial dimension and channels), which means the X_train must be a 3D tensor (batch, spatial dimensions, channels).

Similarly, for a 2D CNN, you would have 2 spatial dimensions (H, W for example) and would be 3D samples (H, W, Channels) and X_train would be (Samples, H, W, Channels)

Let's try this with code -

import tensorflow as tf
from tensorflow.keras import layers

X_2D = tf.random.normal((100,7,3))   #Samples, width/time, channels (feature maps)
X_3D = tf.random.normal((100,5,7,3)) #Samples, height, width, channels (feature maps)
X_4D = tf.random.normal((100,6,6,2,3))   #Samples, height, width, time, channels (feature maps)

For applying 1D CNN -

#With padding = same, the edge pixels are padded to not skip a few

#Out featuremaps = 10, kernel (3,)
cnn1d = layers.Conv1D(10, 3, padding='same')(X_2D) 
print(X_2D.shape,'->',cnn1d.shape)

#(100, 7, 3) -> (100, 7, 10)

For applying 2D CNN -

#Out featuremaps = 10, kernel (3,3)
cnn2d = layers.Conv2D(10, (3,3), padding='same')(X_3D) 
print(X_3D.shape,'->',cnn2d.shape)

#(100, 5, 7, 3) -> (100, 5, 7, 10)

For 3D CNN -

#Out featuremaps = 10, kernel (3,3)
cnn3d = layers.Conv3D(10, (3,3,2), padding='same')(X_4D) 
print(X_4D.shape,'->',cnn3d.shape)

#(100, 6, 6, 2, 3) -> (100, 6, 6, 2, 10)

Truly great answer explaining the concept of convolution layers. To add on to your error: `...expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 1)`, the `conv2D` layer expected data of shape `(batch size, 100, 100, 3)` but you fed it with some different dimension. I think you must revise the data shape, if you intend to use the Conv2D layer. — krenerd, Feb 16 '21 at 09:55
Thank you for your reply and your fairly thorough explanation, I really understood the example you gave me, except that, by applying this to my data, it seems a little different to me from what you said, because the image shape = 100x100x30 which means 10000x30 where 10000 covers the pixels. So, I really wonder how to get the format of (batch-size, 100,100,30) — Andrea, Feb 16 '21 at 10:02
Do you want to apply a 1D CNN or a 2D cnn? If you want to use a 1D CNN then reshape it to (10000,3), 1 spatial dimension.... if you want to use 2D CNN, then reshape it as (100,100,3) — Akshay Sehgal, Feb 16 '21 at 10:04
Also, in your code, i am not sure why you have `input_layer = (30,1)` — Akshay Sehgal, Feb 16 '21 at 10:10
I followed an example somewhere and just applied it to my data. If there are any mismatches, please correct them for me — Andrea, Feb 16 '21 at 10:15
So, what is the 100,100 in your data? I understand you have 30 features, but what is the 100X100 — Akshay Sehgal, Feb 16 '21 at 10:18
And how many such images do you have? so how many, 100X100 images with 30 channels — Akshay Sehgal, Feb 16 '21 at 10:19
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/228788/discussion-between-andrea-and-akshay-sehgal). — Andrea, Feb 16 '21 at 10:22

score -1 · Answer 2 · answered Feb 16 '21 at 09:44

-1

By a 100x100x30 input shape, are you saying the batch size is 100? Or is each data in a shape of 100x100x30? In the second case, you must use a Conv2D layer instead. Input shapes of each layer are supposed to be:

Conv1D: (size1, channel_number), Conv2D: (size1, size2, channel_number) , Conv3D: (size1, size2, size3, channel_number)

The 1DCNN, 2DCNN, 3DCNN denotes the dimension of each kernel and channel of the convolution layer.

answered Feb 16 '21 at 09:44

krenerd

741
4
22

Thank you Krenerd for your interest. Yes, as you said, each data is in a shape of 100x100x30 – Andrea Feb 16 '21 at 09:49

Difference between the input shape for a 1D CNN, 2D CNN and 3D CNN

2 Answers2

Kernel moves over the spatial dimensions

Same kernel, all channels

Operation is a dot product