How to train an lstm for speech recognition

Question

I'm trying to train lstm model for speech recognition but don't know what training data and target data to use. I'm using the LibriSpeech dataset and it contains both audio files and their transcripts. At this point, I know the target data will be the transcript text vectorized. As for the training data, I was thinking of using the frequencies and time from each audio file (or MFCC features). If that is the correct way to approach the problem, the training data/audio will be multiple arrays, how would I input those array into my lstm model? Will I have to vectorize them?

Thanks!

score 14 · Accepted Answer · edited May 23 '17 at 12:09

14

To prepare the speech dataset for feeding into the LSTM model, you can see this post - Building Speech Dataset for LSTM binary classification and also the segment Data Preparation.

As a good example, you can see this post - http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/. This post talks about how to predict sequence of vectors in Keras using RNN - LSTM.

I believe you will find this post (https://stats.stackexchange.com/questions/192014/how-to-implement-a-lstm-based-classifier-to-classify-speech-files-using-keras) very helpful too.

edited May 23 '17 at 12:09

Community

1
1

answered Nov 26 '16 at 00:18

Wasi Ahmad

35,739
32
114
161

@user562 Can you please share the approach or code source for ASR model. I have been working for my college project and didn't got the much information on it. – James Dec 10 '19 at 04:01

How to train an lstm for speech recognition

1 Answers1