Startup delay of Android Speech Recognition Activity

Question

I'm trying to implement an Android application that has a conversation with the user via a text-to-speech and Android's speech recognition activity.

The following code starts the activity, as documented in the tutorial:

Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
        RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Speech recognition demo");
startActivityForResult(intent, VOICE_RECOGNITION_REQUEST_CODE);

The problem is the activity takes sometime between 0.5 to 1 seconds to start recording the user's voice. This doesn't seem like a lot, but this often means the user has already started talking before the speech recognition activity has begun recording, meaning the application will miss part of what the user says.

Is there a good way to get around this delay so that I can start speech recognition as soon as text to speech is done speaking?

Possibilities I've considered:

Preload the activity in Android and pause it on start. I don't think there's any way to do this unless I have the ability to change the code within the activity, which I don't as it's not part of the Android source.
Time the call to start the activity before the text to speech is done. This isn't ideal because it relies on undefined behavior: how long the speech recognition activity takes to load, which can vary from system to system. Additionally it requires knowledge of how long text to speech will take to say a phrase, which is not part of the text-to-speech API.
Start the speech recognition activity and then pause the thread that it's running on. Definitely Not Recommended.
Call methods that aren't exposed in the API from the speech recognition activity from my activity. I don't know how to do this and am not sure if it's even possible.
Implement my own version of the speech recognition activity. This is what I'm doing now, but it's not trivial by any means and I'd rather not have to write my own FLAC encoder in Java and use Google's servers to do speech recognition without permission.

If you have any other idea of how this could be properly done or a way to get around any of the above problems that would be awesome.

score 1 · Answer 1 · answered Aug 17 '11 at 16:59

One thing you can do is to encourage your users to speak longer commands. That way, if they start speaking too soon, the system can recognize the later part of a command.

For example, instead of having the system recognize "Open email" you could encourage the users to say "System open email" That way if the system only hears the "open email" part it can still recognize the command.

It might add unnecessary words to commands, but I believe it is less awkward than making the user pause. As you describe that delay is problematic.

Yeah this is definitely a possible solution, but it isn't great for short form answers like "yes" "no" or "cancel" where the user cares most about speed. — sskates, Aug 19 '11 at 22:58

score 1 · Answer 2 · edited May 23 '17 at 12:33

Looks like there is a lower level way to control the Speech Recognition activity.

Create an object called SpeechRecognizer, call SpeechRecognizer.setRecognitionListener() and pass it a RecognitionListener. Then pass a RecognizerIntent.ACTION_RECOGNIZE_SPEECH Intent to SpeechRecognizer.startListening() which will start listening and perform speech recognition without waiting for a popup.

From: How can I use speech recognition without the annoying dialog in android phones

Startup delay of Android Speech Recognition Activity

2 Answers2