I am working on an Android project which should be able to record conversations or audio for an extended period of time (over 1 hour) and transcribe or convert the speech to text.
Following the post on Continuously recognize everything being said on Android?, I have downloaded the demo version and created a Language Model Search using En-US generic language model. It works but the problem then is it is inaccurate (I can work on that later) but more importantly, it stops after a very short time or after 2-3 words, decodes and displays it and in the process, misses out on the rest of the sentence. How can I make it run so that it works for continuous decoding? I have the following for code where LANGUAGE_SEARCH is the search name:
public void onEndOfSpeech() {
switchSearch(LANGUAGE_SEARCH);
}
Is keyword spotting mode better for continuous recognition? The problem then is that it only recognizes a certain keyword or phrase. I can add multiple keywords but I am not sure if that is the proper way of transcribing a conversation. If yes? Is there a large enough file that recognizes most words?
My third question is regarding passing a file to PocketSphinx for decoding. Can pocketSphinx record audio for an extended period and then convert it to text using Model Language Search? Or do I need to record a file say using MediaRecorder, save it and then have PocketSphinx decode it separately? (I haven't worked with this yet, so I am not really sure).
Any help is appreciated! Is there a better way to transcribe audio if real time decoding is not a priority? Thanks!