5

Is there a comprehensive CTC loss example with Tensorflow out there? The docs for tensorflow.contrib.ctc don't contain enough information for me. I know that there is one Stackoverflow post, but I can't get that to work.

Maybe someone has a complete (bidirectional) LSTM example with sample data that he/she could share. Thanks.

Community
  • 1
  • 1
Tom
  • 1,241
  • 2
  • 13
  • 21

3 Answers3

3

See here for an example with bidirectional LSTM and CTC implementations, training a phoneme recognition model on the TIMIT corpus. If you don't have access to TIMIT or another phoneme-transcribed data set, you probably won't get any decent performance with a single-layer model like this, but the basic structure should hold.

Update: If you don't have access to TIMIT, or you just want to see the thing run without formatting your inputs to make the code work, I've added an 8-sample toy data set that you can overfit to see the training in action.

Jon Rein
  • 868
  • 2
  • 8
  • 13
  • Thanks for the example. I really appriciate it. Do you mind pushing some demo data to your repo as well, so that one can actually run the code and inspect it? Thanks. – Tom Jul 14 '16 at 14:30
  • @JonRein Thank you for the nice example. Could you please upload a file that maps the target classes into phonemes or characters? I would like to see how you handle the blank spaces between phonemes or sequence of chars. What does class '0' correspond to in your sample_data/char_y/*.npy – VM_AI Jul 29 '16 at 21:10
  • @VM_AI The class/character mapping for the toy data set was randomized, as the source data is not publicly available. The blank does not appear in the target data files. The ctc_loss op handles the blank insertion for you. – Jon Rein Aug 02 '16 at 13:45
  • @JonRein When we say blank, we mean spaces between the words right ? Because when we create sparse tensor for targets, left out spaces will be filled by zeros, and what do you think it should map for ? – VM_AI Aug 02 '16 at 13:47
  • 1
    @VM_AI No, in CTC terms, the blank is a special class, which is inserted between every character in the target sequence (by the ctc op, not by you). For our purposes, the space between words is just a character, same as any other character, and you should definitely not remove it. Apologies for not being clearer about that. You can map it to an integer value of 0, 5, 23, whatever. I believe it's true that the dense tensor version of the targets sparse tensor will be zero-padded, but that's why the sparse tensor constructor takes the valid indexes as an input. – Jon Rein Aug 03 '16 at 16:29
  • What script are you using to format the TIMIT data? – TFUser Aug 21 '16 at 15:20
  • Now tensorflow released 1.0 version and many ops are removed or put into another place. Could you please update the repo to make it run with tf 1.0? Many thanks! – soloice Feb 19 '17 at 13:41
1

Have you seen the unit tests for CTC? See the ctc_loss test and the ctc_decoder tests.

These contain examples of usage that may get you further along in understanding how to use the ops.

Eugene Brevdo
  • 899
  • 7
  • 8
0

Chris Dinanth has provided a great example for CTC and RNN used for speech recognition. His models recognizes speech using phonemes. The CTC loss used is tf.keras.backend.ctc_batch_cost.

The code is at https://github.com/chrisdinant/speech and great explanation of what has been done can be found at https://towardsdatascience.com/kaggle-tensorflow-speech-recognition-challenge-b46a3bca2501

Rasula
  • 47
  • 1
  • 5