I have read some journals and paper of HMM and MFCC but i still got confused on how it works step by step with my dataset (audio of sentences dataset).
My data set Example (Audio Form) :
- hello good morning
- good luck for you exam
- etc about 343 audio data and 20 speaker (6800 audio data)
All i know :
- My sentences datasets is used to get the transition probabilty
- Hmm states is the phonemes
- 39 MFCC features is used to train the HMM models
My Questions :
- Do i need to cut my sentences into words or just use sentences for train HMM models?
- Do I need phonemes dataset for train ? if yes do i need to train it use HMM too ? if not how my program recognize the phonemes for HMM predict input?
- What steps i must do first ?
Note : Im working with python and i used hmmlearn and python_speech_features as my library.