How to train HMM with audio senteces dataset for speech recognition?

Question

I have read some journals and paper of HMM and MFCC but i still got confused on how it works step by step with my dataset (audio of sentences dataset).

My data set Example (Audio Form) :

hello good morning
good luck for you exam
etc about 343 audio data and 20 speaker (6800 audio data)

All i know :

My sentences datasets is used to get the transition probabilty
Hmm states is the phonemes
39 MFCC features is used to train the HMM models

My Questions :

Do i need to cut my sentences into words or just use sentences for train HMM models?
Do I need phonemes dataset for train ? if yes do i need to train it use HMM too ? if not how my program recognize the phonemes for HMM predict input?
What steps i must do first ?

Note : Im working with python and i used hmmlearn and python_speech_features as my library.

Do you have the time segmentation of the phonemes? – Gabriel M Jul 04 '18 at 07:15 — Gabriel M, Jul 04 '18 at 07:15

score 1 · Accepted Answer · answered Jul 04 '18 at 07:15

Do i need to cut my sentences into words or just use sentences for train HMM models?

Theoretically you just need sentences and phonemes. But having isolated words may be useful for your model (it increases the size of your training data)

Do I need phonemes dataset for train ? if yes do i need to train it use HMM too ? if not how my program recognize the phonemes for HMM predict input?

You need phonemes, otherwise it will be too hard for your model to find the right phoneme segmentation if it does not have any example of isolated phonemes. You should first train your HMM states on the isolated phonemes and then add the rest of the data. If you have enough data, your model may be able to learn without the isolated phoneme examples, but I wouldn't beat on this.

What steps i must do first ?

Build your phoneme examples and use them to train a simple HMM model you don't model the transition between phonemes. Once your hidden states have some information about phonemes, you may continue the training on isolated words and sentences.

if i continue the training with my sentences can my programs predict for words or just predict the senteces ? — MarcellSinaga, Jul 05 '18 at 08:25
The functioning mechanism of the algorithm does not really know the difference between a word and a sentence, unless you introduce a hidden state for word transitions. HMM models 'sequences', so if it works for sentences it should also work for words. — Gabriel M, Jul 05 '18 at 08:35

How to train HMM with audio senteces dataset for speech recognition?

1 Answers1