1

I trying to build a deep learning model with VGG16 on top. I have implemented it in Keras using following code:

image_input = Input(shape=(224, 224, 3))

model = VGG16(input_tensor=image_input, include_top=True,weights='imagenet')
model.summary()
fc7 = model.get_layer('fc2').output
conv1d = Conv1D(1,5,activation='relu', name="conv1d",input_shape=(1,4096)) (fc7) #error appears here
# flat = Flatten()(conv1d)
fc8 = Dense(512, activation='relu', name="fc8")(conv1d)
#x= Flatten(name='flatten')(last_layer)
out = Dense(num_classes, activation='softmax', name='output')(fc8)
custom_vgg_model = Model(image_input, out)
custom_vgg_model.summary()

I am getting the following error:

ValueError: Input 0 is incompatible with layer conv1d: expected ndim=3, found ndim=2

Why can't we do the consecutive feature vectors 1d convolution like in the image below? enter link description here

Rehan
  • 454
  • 7
  • 19

1 Answers1

1

A fully connected layer in a VGG is 2D, and a 1D convolutional layer expects 3D data.

At the point where VGG adds a Dense layer, it destroys the image format (4D) with a flatten or a global pooling, transforming it into plain data (2D). You no longer have dimensions to use convolutions.

If you try to explain why you want a Conv1D, what do you expect from it, then we could think of an alternative.


Example model:

movie_data = any_data_with_shape((number_of_videos, frames, 224, 224, 3))
movie_input = Input((None,224,224,3)) #None means any number of frames

vgg = VGG16(include_top=True,weights='imagenet')

This part is only necessary if you're getting intermediary outputs from vgg:

vgg_in = vgg.input
vgg_out = vgg.get_layer('fc2').output #make sure this layer exists
vgg = Model(vgg_in, vgg_out)

Continue:

vgg_outs = TimeDistributed(vgg)(movie_input) #out shape (None, frames, fc2_units)

outs = Conv1D(.....)(vgg_outs)
outs = GlobalAveragePooling1D()(outs)
outs = Dense(....)(outs)
.....

your_model = model(move_input, outs)
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Thank you for answering. I am using video frames for training data. I want the 4096 vectors out of the frame and perform a time series convolution by concatenating 5 frames using a kernel size of 5. – Rehan Mar 18 '20 at 18:33
  • 1
    You need a model with input shape `(5,224,224,3)`, that uses a `TimeDistributed(model)(inputs)`, your layers will come after this. – Daniel Möller Mar 18 '20 at 18:35
  • Can you please elaborate it in your answer how to do it exactly! Sorry I am new to Keras – Rehan Mar 18 '20 at 18:44
  • Actually How can I change Input shape this will affect the vgg architecture, – Rehan Mar 18 '20 at 18:46
  • Ok, but please explain what you want to do with that convolution, you didn't connect it anywhere. – Daniel Möller Mar 18 '20 at 20:18
  • I did what I suppose you wanted. – Daniel Möller Mar 18 '20 at 20:23
  • How to train network now, I mean I have to concatenate frames and re-shape it every time to fed it to the network? – Rehan Mar 19 '20 at 06:23
  • You pass inputs with shape `(videos, 5, 224, 224, 3)`, wasn't that what you wanted? – Daniel Möller Mar 19 '20 at 12:27
  • Can I do something lik passing 224,224,3 frame one by one and perform the 1d convolution on the output of fc2 of last 5 frames with filter size of 5? – Rehan Mar 19 '20 at 12:35
  • You must pass all frames together. If your idea is to convolve the whole movie with a kernel 5, your input shape should be `(videos, None, 224,224,3)`, and your input data should be `(videos, frames, 224,224,3)` – Daniel Möller Mar 19 '20 at 12:36
  • What if I extract 4096 vectors from frames from vgg and then pass these on my own model separately. It works without passing 5 vectors altogether. – Rehan Mar 19 '20 at 12:43
  • My model means the layers I have appended with current vgg as above – Rehan Mar 19 '20 at 12:43
  • Also can you please clarify what do you mean by videos in above shape. I need to concatenate 5 frames and pass it to the network – Rehan Mar 19 '20 at 16:32
  • The number of videos you will pass to the network to train. You won't be able to train a single video, not enough data. – Daniel Möller Mar 19 '20 at 16:43
  • Videos need to be converted to frames right and then concatenate 5 frames and than pass it to model. Or Is therr any other way? – Rehan Mar 19 '20 at 16:45
  • So videos are total frames overall? – Rehan Mar 19 '20 at 16:49
  • No, videos are really videos. Frames is the second dimension. One video is a group of frames. – Daniel Möller Mar 19 '20 at 16:50
  • How can I pass video to my model? – Rehan Mar 19 '20 at 16:52
  • One video is an array with shape `(frames, sizeX, sizeY, channels)`. It's just like any other data. You already have the data, right? – Daniel Möller Mar 19 '20 at 16:53
  • I have the data as videos, I have converted them to frames to use it. Any resource how to preprocess videos like that? – Rehan Mar 19 '20 at 16:55
  • You can concatenate the frames, or you can try some library that reads video files. I never worked with videos, but you can google some such as `cv2`: https://stackoverflow.com/questions/41441150/how-to-read-video-files-using-python-opencv , or other libraries: https://scikit-image.org/docs/dev/user_guide/video.html – Daniel Möller Mar 19 '20 at 17:01
  • These are all what I am doing. They are reading frames as np arrays and then we will be concatenating them to pass it to model. But concatenation of frames is consuming alot of memory – Rehan Mar 19 '20 at 17:06
  • You need to read about "generators" for keras. You cannot load the entire data at once in this case, but you need to load things batch by batch with a generator. – Daniel Möller Mar 19 '20 at 17:09
  • Can you edit your answer and add that input preprocessing as well in your answer please using generator ! – Rehan Mar 19 '20 at 17:35