I am recently working on using CMU's sphinx4 for transcription and eventually forced alignment, i.e. aligning audio with its transcript.
I found a project called AutoCap that basically did what I wanted to develop. So, I installed it but it did not work. I tried tweaking it but all I obtained was incorrect timestamps.
So, I thought of using sphinx4 and giving it a go myself. I successfully transcribed a wav file using Sphinx's Transcriber.jar file. But I could not get it working for an audio with non-digits data. The readme page states 'people who want to transcribe non-digits data should modify the config.xml file to use the correct grammar, language model, and linguist to do so'.
So, can anyone provide me some help on either of these :
- AutoCap
- Using Sphinx4 to transcribe non-digits data
- Forced Alignment
Thanks.