Similar Posts: Firstly, these 2 posts are similar if not the same. I tried to implement these in vain. So I'm missing something probably because of my inexperience in Keras. similar 1 , similar 2
The problem: I have a data generator to feed data into various models to evaluate model performance and to learn Keras.
model.fit_generator(generator=img_gen.next_train(), ....
One of the inputs produced by this generator is a tensor "labels" of shape=[batch_size, num_letters]. This tensor is the first input to
K.ctc_batch_cost(labels, ....
similar to image_ocr.py line 369.
The above "keras example" creates an RNN/GRU where each output step of the RNN is one input to the ctc, where there are num_letters steps in the RNN and the labels is of shape (?, num_letters). This worked fine for the first 6 models I have tested so far.
I am testing a new model where each step of the RNN/GRU output is "1 to n" inputs to the ctc and leaving it to training to optimize the output of each step. So my ctc needs as it's first input a tensor of shape=(?,num_letters*n) but the data generator produces shape=(?, num_letters).
Side note: The RNN in my model actually produce as a whole shape=(?, n, num_letters). I know how to convert this to (?, n*num_letters).
Solution 1 is a hack, change the generator so that it's unique for this model being tested. This generator will produce tensors of shape=(?,num_letters*n). I don't like this because the generator is also being evaluated and would like it to be constant for each model being evaluated.
Solution 2 is a hack, create a generator that encapsulates the original generator and augments the produced output.
Solution 3: have the model take an input of shape=(?, num_letters) and concatenate the necessary padding so the shape is (?,num_letters*n) that the ctc wants.
Here is what I tried:
num_letters = 3
n = 5
keras_tensor_input = keras.layers.Input( shape=[num_letters], dtype='float32' )
print("keras_tensor_input = ", keras_tensor_input)
# keras_tensor_input = Tensor("input_1:0", shape=(?, 3), dtype=float32)
# note ? is batch size
keras_tensor_neg_1 = keras.backend.constant( -1, dtype='float32', shape=[num_letters] )
print("keras_tensor_neg_1 = ", keras_tensor_neg_1)
# keras_tensor_neg_1 = Tensor("Const_1:0", shape=(3,), dtype=float32)
# note no batch size
keras_tensor_neg_1_tiled = keras.backend.tile(keras_tensor_neg_1, (n-1))
print("keras_tensor_neg_1_tiled = ", keras_tensor_neg_1_tiled)
# keras_tensor_neg_1_tiled = Tensor("Tile_2:0", shape=(12,), dtype=float32)
# note no batch size, but now the correct fill length
# FAILED attempt to put in a batch size
layer_const_neg_1 = keras.layers.Lambda(lambda x: keras_tensor_neg_1_tiled ,output_shape=[(n-1)*num_letters] )
keras_tensor_neg_1_prime = layer_const_neg_1(keras_tensor_neg_1)
print("keras_tensor_neg_1_prime = ", keras_tensor_neg_1_prime)
# keras_tensor_neg_1_prime = Tensor("Tile_2:0", shape=(12,), dtype=float32)
# CRASH AT NEXT STEP BECAUSE NO ? in keras_tensor_neg_1_prime
# concatenate the input from the generator and the padding
keras_tensor_concat = keras.layers.Concatenate()( inputs = [keras_tensor_input, keras_tensor_neg_1_prime] )
print("keras_tensor_concat = ", keras_tensor_concat)
my_model = keras.models.Model( inputs=[keras_tensor_input], output=keras_tensor_concat)
# dummy optimizer, loss, and metric just to allow compile to pass
my_model.compile(optimizer='rmsprop',loss='categorical_crossentropy', metrics=['accuracy'])
# a batch of size 1, for a tensor of shape =[3]
d1 = numpy.array([[1, 2, 3]])
out = my_model.predict( [ d1 ] )
print(out)
Notes:
- could have made constant shape=[num_letters*(n-1)] and drop the tile, but the same problem of missing batch size remains.
- if I place batch_size as first dimensions, then it still fails complaining of (?, 3) can not be concatenated with (1, 12)
Thank you in advance.