This thread covers some of the nuances about CTC Loss and its unique way of capturing repeated characters and blanks in a sequence: CTC: What is the difference between space and blank? but its practical implementation is unclear.
Lets say that I am trying predict these two sequences that correspond to two pictures.
seq_list = ['pizza', 'a pizza']
and I map their characters to integers for the model with something like:
mapping = {'p': 0,
'i': 1,
'z': 2,
'a': 3,
'blank': 4}
What do the individual labels look like?
pizza_label = [0, 1, 2, 4, 3] # pizza
a_pizza_label = [3, 0, 1, 2, 4, 3] # a pizza
Then, what about combining them so the shape of the labels are the same for the model? Do we use blank for padding?
pizza_label = [0, 1, 2, 4, 3, 4] # pizza
a_pizza_label = [3, 0, 1, 2, 4, 3] # a pizza