0

This thread covers some of the nuances about CTC Loss and its unique way of capturing repeated characters and blanks in a sequence: CTC: What is the difference between space and blank? but its practical implementation is unclear.

Lets say that I am trying predict these two sequences that correspond to two pictures.

seq_list = ['pizza', 'a pizza']

and I map their characters to integers for the model with something like:

mapping = {'p': 0,
            'i': 1,
            'z': 2,
            'a': 3,
            'blank': 4}

What do the individual labels look like?

pizza_label = [0, 1, 2, 4, 3] # pizza
a_pizza_label = [3, 0, 1, 2, 4, 3] # a pizza

Then, what about combining them so the shape of the labels are the same for the model? Do we use blank for padding?

pizza_label = [0, 1, 2, 4, 3, 4] # pizza
a_pizza_label = [3, 0, 1, 2, 4, 3] # a pizza
John Stud
  • 1,506
  • 23
  • 46

1 Answers1

1

Padding: you have to pad the smaller image so it has the same width as the larger image. Only this way you can put it into one batch. Simply use black background.

Labels: you do not have to take care of the CTC blank yourself. It is enough to translate the char sequence to a sequence of labels (integers), e.g. [mapping[c] for c in 'pizza'] in . The CTC loss function takes care of handling the CTC blank.

Harry
  • 1,105
  • 8
  • 20
  • Thanks! Could you expand upon the label generation. I am of course responsible to create the labels before the loss function can do its work. Thus, what do I tell the machine what a *correct* label looks like, when given `pizza` and `a pizza`. – John Stud Jan 16 '23 at 14:15
  • 1
    you just map each character to an integer label (this integer label corresponds to the position in the RNN output where the character will be predicted). Depending on the deep learning framework you have to provide this label in certain ways, e.g. for TF as sparse tensor: https://github.com/githubharald/SimpleHTR/blob/master/src/model.py#L173 – Harry Jan 18 '23 at 08:06