1

I am working on an Android project which should be able to record conversations or audio for an extended period of time (over 1 hour) and transcribe or convert the speech to text.

Following the post on Continuously recognize everything being said on Android?, I have downloaded the demo version and created a Language Model Search using En-US generic language model. It works but the problem then is it is inaccurate (I can work on that later) but more importantly, it stops after a very short time or after 2-3 words, decodes and displays it and in the process, misses out on the rest of the sentence. How can I make it run so that it works for continuous decoding? I have the following for code where LANGUAGE_SEARCH is the search name:

public void onEndOfSpeech() {
        switchSearch(LANGUAGE_SEARCH);
}

Is keyword spotting mode better for continuous recognition? The problem then is that it only recognizes a certain keyword or phrase. I can add multiple keywords but I am not sure if that is the proper way of transcribing a conversation. If yes? Is there a large enough file that recognizes most words?

My third question is regarding passing a file to PocketSphinx for decoding. Can pocketSphinx record audio for an extended period and then convert it to text using Model Language Search? Or do I need to record a file say using MediaRecorder, save it and then have PocketSphinx decode it separately? (I haven't worked with this yet, so I am not really sure).

Any help is appreciated! Is there a better way to transcribe audio if real time decoding is not a priority? Thanks!

Community
  • 1
  • 1
skbrhmn
  • 1,124
  • 1
  • 14
  • 36
  • 1
    It is better to ask more focused questions here. Keyword spotting mode works only for few keywords it is not appropriate for continuous speech recognition. Large vocabulary decoder is too slow to decode in realtime, you have to record first, then process separately. – Nikolay Shmyrev May 30 '16 at 11:54

0 Answers0