How to prepare a dataset for speech recognition

Question

I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number per file) I will be using mfcc as features for the network.

Further, I would like to know the difference in the dataset if I am going to use a library that support CTC (Connectionist Temporal Classification)

Hi. Is your model open-source so I can check it out? – Shadi Jun 18 '18 at 09:59 — Shadi, Jun 18 '18 at 09:59

score 4 · Accepted Answer · edited May 23 '17 at 12:32

4

You can use the answer/guidance provided here

Depending on what library you are using to create your LSTM(pybrain, theano, keras), you can look through their documentation.

I would recommend using Theano(Binary LSTM link) or Keras(Tutorial) for this because they are fairly simple to understand and are well documented.

hope this helps.

edited May 23 '17 at 12:32

Community

1
1

answered Jan 15 '16 at 13:53

Nirbhay Tandon

318
2
13

How to prepare a dataset for speech recognition

1 Answers1