1

I'm trying to develop an image captioning model. I'm referring to this Github repository. I have three methods, and they perform the following:

  1. Generates the image model
  2. Generates the caption model
  3. Concatenates the image and caption model together

Since the code is long, I've created a Gist to show the methods.

Here is a summary of my image model and caption model.

But then I run the code, I am getting this error:

TraceTraceback (most recent call last):
  File "trainer.py", line 99, in <module>
    model.fit([images, encoded_captions], one_hot_captions, batch_size = 1, epochs = 5)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py", line 950, in fit
    batch_size=batch_size)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py", line 671, in _standardize_user_data
    self._set_inputs(x)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/engine/training.py", line 575, in _set_inputs
    assert len(inputs) == 1
AssertionError

Since the error is coming from Keras library, I have no idea how to debug this. But something is wrong when I try to concatenate them together.

I would like to know if I'm missing something here

today
  • 32,602
  • 8
  • 95
  • 115
Yedhu Krishnan
  • 1,225
  • 15
  • 31

1 Answers1

2

You need the get the outputs of the models, using output attribute, and then use Keras functional API to be able to concatenate them (by either of Concatenate layer or its equivalent functional interface concatenate) and create the final model:

from keras.models import Model

image_model = get_image_model()
language_model = get_language_model(vocab_size)

merged = concatenate([image_model.output, language_model.output])
x = LSTM(256, return_sequences = False)(merged)
x = Dense(vocab_size)(x)
out = Activation('softmax')(x)

model = Model([image_model.input, language_model.input], out)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit([images, encoded_captions], one_hot_captions, ...)

As it is now in your code, you can also define a function for the model creation logic:

def get_concatenated_model(image_model, language_model, vocab_size):
    merged = concatenate([image_model.output, language_model.output])
    x = LSTM(256, return_sequences = False)(merged)
    x = Dense(vocab_size)(x)
    out = Activation('softmax')(x)

    model = Model([image_model.input, language_model.input], out)
    return model
today
  • 32,602
  • 8
  • 95
  • 115
  • That helped to fix the issue. Thank you! I have another question. My output has to be a time distributed one, because I have more than one word as the caption. Here, the expected output shape is ```(None, 2327)```, but I need ```(None, 41, 2327)```. 41 is the maximum length. I tried to add a ```TimeDistributed``` layer, but that didn't work as expected. Any idea how to solve that? – Yedhu Krishnan Sep 29 '18 at 11:16
  • 1
    @YedhuKrishnan You are almost there: no need for a `TimeDistributed` layer, since [Dense layer is applied on the last axis](https://stackoverflow.com/a/52092176/2099607). Just set `return_sequences=True` in the LSTM layer. – today Sep 29 '18 at 13:47
  • @today , could you please write the similar function but using the sequential Keras API? – Minions Nov 22 '18 at 12:51
  • @Ghanem You can't do that since Sequential API can be used for sequential models, i.e. one layer is directly connected to the previous layer. Therefore, when there is a concatenation layer, then the model is no longer sequential. – today Nov 22 '18 at 12:57
  • @today, thanx, but I found many examples where they use the sequential API with, as ex.: https://statcompute.wordpress.com/2017/01/08/an-example-of-merge-layer-in-keras/ – Minions Nov 22 '18 at 13:03
  • 1
    @Ghanem That was in older versions of Keras (I think < 2.0) when there was a `Merge` layer for all kinds of merge operation. In current versions there is no such a layer and each merge operation [has its own separate layer](https://keras.io/layers/merge/). And you can't use them in a Sequential model. – today Nov 22 '18 at 13:14