5

I am trying to use the CTC loss function in my network, but don't quite understand when to feed the 'blank' label as a label.

I use it in gesture recognition as described byMolchanov, but what get's me confused that there is a 'no gesture' as well.

In tensorflow docs, it is described that

The inputs Tensor's innermost dimension size, num_classes, represents num_labels + 1 classes, where num_labels is the number of true labels, and the largest value (num_classes - 1) is reserved for the blank label.

If I now use the 'blank' label, to indicate that there is no gesture, I am limited in my training, because of the error

Saw a non-null label (index >= num_classes - 1) following a null label

I am assuming that null label is the same as the blank label.

The problem is, when I want to feed data that starts with no gesture (mapped to null label) and has then a gesture, I get exactly this error. I can avoid it by adding two more labels, one for 'no gesture' and one for 'blank label/null label' next to my existing labels. Then I only feed the 'no gesture' label but never the 'blank' label, but this doesn't seem quite right.

So my question is, what should I use the 'blank/null' label for?

I can imagine in language processing, you would use the sentence ending dot usually as the 'null' label? But there is no ending gesture as it is one continuous stream.

Thank you

Anaphory
  • 6,045
  • 4
  • 37
  • 68
Kilsen
  • 136
  • 2
  • 11

1 Answers1

2

EDIT I highly recommend reading this distill article. "The ϵ (blank) token doesn’t correspond to anything and is simply removed from the output." It is used to 'interrupt' the merging of repeating tokens.

The blank label serves as a transitioning state between two classes.

Read more

To answer my question itself, you don't assign the blank label to anything, but still have it as an existing class. In my case, I had added two more labels, one for the no gesture class and one for the blank.

(That's at least how I did it and got some decent results)

Kilsen
  • 136
  • 2
  • 11
  • The blank label is NOT a transition between two classes. It servers as an indicator of the absence of labels, i.e it's symbolises "no-label". – spurra Mar 21 '17 at 16:34
  • @Kilsen I'm also faced with the exact same dilemma as yours (reading the same paper by Molchanov). So are you saying that what you did was correct? Meaning, one has to have `no gesture` label as well as the `blank` label? If so, those portions of the gesture sequence where there is no gesture are marked with `no gesture`, and `blank` is never used to label anything? But how does the network output class `blank` during training? This is a bit confusing to me... – Maghoumi Jan 28 '19 at 00:04
  • 1
    @M2X Yes, it is correct to have a blank and a no gesture label. Yes, only use no gesture when labelling your data. What is the problem of having the output class blank during training? Can you elaborate? – Kilsen Jan 29 '19 at 18:46
  • @Kilsen Thanks a lot for your response. `What is the problem of having the output class blank during training?` The problem is that the training data does not have any `blank` labels, so one would expect that the network learns not to output `blank` either. I was wondering if the fact that the network tries not to output `blank` would be an issue. – Maghoumi Jan 29 '19 at 19:08
  • 1
    @M2X as far as I understand it, your network will still output the blank label. The decoding method will then ignore them. `To get around these problems, CTC introduces a new token to the set of allowed outputs. This new token is sometimes called the blank token. We’ll refer to it here as \epsilon.ϵ. The \epsilonϵ token doesn’t correspond to anything and is simply removed from the output.` It is used to separate two successive identical gestures/letters, which would normally be collapsed by the decoding method. – Kilsen Jan 31 '19 at 09:49
  • Thanks for the explanation, that is inline with what I thought was the case after reading your previous comments. Cheers – Maghoumi Jan 31 '19 at 18:38
  • @Kilsen,I am facing the same error while Training of OCR, I added blank as a character in character list, should i remove blank and explicit and one more class at dense layer? Infact , I have spaces between the words how model will deal it if i remove blank from character list? – maryam mehboob Jul 11 '21 at 15:17