5

I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number per file) I will be using mfcc as features for the network.

Further, I would like to know the difference in the dataset if I am going to use a library that support CTC (Connectionist Temporal Classification)

Infinite Loops
  • 671
  • 3
  • 11
  • 23
udani
  • 1,243
  • 2
  • 11
  • 33

1 Answers1

4

You can use the answer/guidance provided here

Depending on what library you are using to create your LSTM(pybrain, theano, keras), you can look through their documentation.

I would recommend using Theano(Binary LSTM link) or Keras(Tutorial) for this because they are fairly simple to understand and are well documented.

hope this helps.

Community
  • 1
  • 1
Nirbhay Tandon
  • 318
  • 2
  • 13