CTC: What is the difference between space and blank?

Question

In the 2006 article about Connectionist Temporal Classification, Alex Graves & co. introduced a model of decoding speech with 27 labels: 26 for the alphabet letters and one for blank, meaning no label (which I understand to be silence).

However, I am seeing a lot of implementations of CTC that use 28 labels, one being the blank and another one being space. So far, I haven't been able to find an explanation for the need to use both these labels and, to me, they represent the same thing.

Could you please explain the difference between blank and space in the context of CTC and why there's a need for both these labels?

Blank is a special character which is used internally by CTC, it does not have a real representation in the world. Space, on the other hand, is the normal whitespace character which is used to separate words. Here I wrote a short introduction to CTC: https://towardsdatascience.com/3797e43a86c — Harry, Mar 27 '19 at 18:25

score 1 · Accepted Answer · answered Feb 07 '20 at 07:07

In Connectionist Temporal Classification space is just a whitespace and blank is '-' which we use to solve the repeated reoccurrence of the data. for example "pizza" will be encoded as "piz-za".

TLDR;

ref: https://towardsdatascience.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7

In CTC there is an issue of how to encode duplicate characters. It is solved by introducing a pseudo-character (called blank, but don’t confuse it with a “real” blank, i.e. a white-space character). This special character will be denoted as “-” in the text. We use a clever coding schema to solve the duplicate-character problem: when encoding a text, we can insert arbitrary many blanks at any position, which will be removed when decoding it. However, we must insert a blank between duplicate characters like in “hello”. Further, we can repeat each character as often as we like. Let’s look at some examples: “to” → “---ttttttooo”, or “-t-o-”, or “to” “too” → “---ttttto-o”, or “-t-o-o-”, or “to-o”, but not “too” As you see, this schema also allows us to easily create different alignments of the same text, e.g. “t-o” and “too” and “-to” all represent the same text (“to”), but with different alignments to the image. The NN is trained to output an encoded text (encoded in the NN output matrix).

CTC: What is the difference between space and blank?

1 Answers1

Linked