In Connectionist Temporal Classification space is just a
whitespace and blank is '-' which we use to solve the repeated
reoccurrence of the data. for example "pizza" will be encoded as
"piz-za".
TLDR;
ref: https://towardsdatascience.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7
In CTC there is an issue of how to encode duplicate characters. It is solved by introducing a pseudo-character (called blank, but don’t confuse it with a “real” blank, i.e. a white-space character). This special character will be denoted as “-” in the text. We use a clever coding schema to solve the duplicate-character problem: when encoding a text, we can insert arbitrary many blanks at any position, which will be removed when decoding it. However, we must insert a blank between duplicate characters like in “hello”. Further, we can repeat each character as often as we like.
Let’s look at some examples:
“to” → “---ttttttooo”, or “-t-o-”, or “to”
“too” → “---ttttto-o”, or “-t-o-o-”, or “to-o”, but not “too”
As you see, this schema also allows us to easily create different alignments of the same text, e.g. “t-o” and “too” and “-to” all represent the same text (“to”), but with different alignments to the image. The NN is trained to output an encoded text (encoded in the NN output matrix).