1

I am trying to have a speech recognition build into my app and I am seeing that onBeginningOfSpeech and onEndOfSpeech are getting fired within the space of 1 second.

Also, when i finish speaking, the speech recognition ends immediately after I give a gap. Normally, the ASR would take about 3-5 seconds wait time before stopping speech recognition.

This code is actually messing up the rest of the speech recognition even on other apps on my phone.

Has there been a case like this?

This is how my code looks like.

Here is my onCreate method for my service. I am using a service to call the Speech recognition

@Override
public void onCreate() {
    super.onCreate();
    createSR();
    mSpeechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    mSpeechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
            RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    mSpeechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
    mSpeechRecognizerIntent.putExtra("android.speech.extra.DICTATION_MODE", true);
    mSpeechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,
            this.getPackageName());
}

Here is the recognition listener code.

protected class SpeechRecognitionListener implements RecognitionListener {

    private static final String TAG = "SRecognitionListener";
    private boolean isUserSpeaking;
    private long userSpokeAt=-1;
    private long userStoppedSpeakingAt = -1;
    private String completeSegment;
    private String recognizingSegment;
    private ArrayList<String> recognizedSegments;

    @Override
    public void onBeginningOfSpeech() {
        Log.d(TAG, "onBeginingOfSpeech"); //$NON-NLS-1$
    }

    @Override
    public void onBufferReceived(byte[] buffer) {

    }

    @Override
    public void onEndOfSpeech() {
        Log.d(TAG, "onEndOfSpeech"); //$NON-NLS-1$
    }

    @Override
    public void onError(int error) {
        Log.d(TAG, "onError: " + error);
        if (error == SpeechRecognizer.ERROR_NO_MATCH) {
            return;
        }

        mIsListening = false;
        Message message = Message.obtain(null, MSG_RECOGNIZER_START_LISTENING);
        try {
            mServerMessenger.send(message);
        } catch (RemoteException e) {

        }
        Log.d(TAG, "error = " + error); //$NON-NLS-1$
    }

    @Override
    public void onEvent(int eventType, Bundle params) {

    }

    /* TODO
    * There needs to be a boolean variable that would make sure that the translated message from the partialResults would be a fresh message by refreshing the entire data in the bundle data
    * Replace the recognizingSegment to have an empty string before doing anything
    * */
    @Override
    public void onPartialResults(Bundle partialResults) {
        ArrayList<String> matches = partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
        if (matches != null && matches.size() != 0) {
            Log.d(TAG, "onPartialResults: " + matches.get(0));
            partialBundleData = partialResults;
            String nextPartResult=matches.get(0);

            if (!recognizingSegment.equals("")){
                String[] nextPartWords = nextPartResult.split(" ");
                String[] recWords;
                String previousSegments="";
                recWords = recognizingSegment.split(" "); //The last recognized segment
                if (recognizedSegments.size()>0){
                    previousSegments=mergeSegments(recognizedSegments);
                }
                if (nextPartWords.length+2>=recWords.length){    //Most definitely the same segment
                    Log.d(TAG, "onPartialResults: matching "+recognizingSegment+" with "+nextPartResult);
                    if (doWordsMatch(recWords,nextPartWords)) { //Since the words match this is probably the same segment
                        recognizingSegment = nextPartResult;
                        partialResult = previousSegments + " " + recognizingSegment;
                        Log.d(TAG, "onPartialResults: Same segment - " + partialResult);
                        partialResults.putString("PartialSentence", partialResult);
                    }else{  //Since the words don't match this is probably a new segment
                        recognizedSegments.add(recognizingSegment);
                        partialResult=previousSegments+" "+recognizingSegment+" "+nextPartResult;
                        Log.d(TAG, "onPartialResults: New segment - " + partialResult);
                        partialResults.putString("PartialSentence",partialResult);
                        recognizingSegment=nextPartResult;
                    }
                }else{  //This must be a new segment
                    Log.d(TAG, "onPartialResults: matching "+recognizingSegment+" with "+nextPartResult);
                    if (!doWordsMatch(recWords, nextPartWords)) {   //Since the words don't match this is probably a new segment
                        recognizedSegments.add(recognizingSegment);
                        partialResult = previousSegments + " " + recognizingSegment + " " + nextPartResult;
                        Log.d(TAG, "onPartialResults: New segment - " + partialResult);
                        partialResults.putString("PartialSentence", partialResult);
                        recognizingSegment = nextPartResult;
                    }else{  //Since the words match this is probably the same segment
                        recognizingSegment = nextPartResult;
                        partialResult = previousSegments + " " + recognizingSegment;
                        Log.d(TAG, "onPartialResults: Same segment - " + partialResult);
                        partialResults.putString("PartialSentence", partialResult);
                    }
                }
            }else{
                partialResult=nextPartResult;
                Log.d(TAG, "onPartialResults: First segment - " + partialResult);
                recognizingSegment=nextPartResult;
                partialResults.putString("PartialSentence",nextPartResult);
            }
            Message message = new Message();
            message.what = ASRService.MSG_RECOGNIZER_PART_RESULT;

            message.setData(partialResults);
            sendMessageToClients(message);
        } else {
            Log.d(TAG, "onPartialResults: No Results");
        }
    }

    private boolean doWordsMatch(String[] phraseA, String[] phraseB){
        int noOfWordsToMatch=3;
        if (phraseA.length<noOfWordsToMatch){
            noOfWordsToMatch=phraseA.length;
        }
        if (phraseB.length<noOfWordsToMatch){
            noOfWordsToMatch=phraseB.length;
        }
        boolean wordsMatch=false;
        int noOfMatchingWords=0;
        for (int i=0; i<noOfWordsToMatch; i++){
            if (phraseA[i].equals(phraseB[i])){
                noOfMatchingWords++;
            }
        }
        Log.d(TAG, "onPartialResults: noOfMatchingWords - "+noOfMatchingWords);
        if (noOfMatchingWords>=2 || noOfMatchingWords>=noOfWordsToMatch){
            wordsMatch=true;
        }
        return wordsMatch;
    }

    private String mergeSegments(ArrayList<String> segments){
        StringBuilder mergedSegments=new StringBuilder();
        for (String segment: segments){
            mergedSegments.append(segment+" ");
        }
        return mergedSegments.toString().trim();
    }


    @Override
    public void onReadyForSpeech(Bundle params) {
        Log.d(TAG, "onReadyForSpeech"); //$NON-NLS-1$
        Message message = new Message();
        message.what = ASRService.MSG_RECOGNIZER_STARTED_LISTENING;
        sendMessageToClients(message);
        userSpokeAt=-1;
        completeSegment ="";
        recognizingSegment="";
        recognizedSegments=new ArrayList<>();
    }

    @Override
    public void onResults(Bundle results) {
        ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
        if (matches != null && matches.size() != 0) {
            Log.d(TAG, "onResults: " + matches.get(0));
            Message message = new Message();
            message.what = ASRService.MSG_RECOGNIZER_RESULT;
            message.setData(results);
            sendMessageToClients(message);
        } else {
            Log.d(TAG, "onResults: No Results");
        }
        cancelSR();
    }

    @Override
    public void onRmsChanged(float rmsdB) {

        if (rmsdB > 20) {
            if (userSpokeAt==-1) {   //The user spoke the first time
                partialResultsTimer = new AsyncTask<Void, Void, Void>() {
                    @Override
                    protected Void doInBackground(Void... params) {
                        try {
                            Thread.sleep(70000);     //We wait for a max duration of this time to cancel the speech recognition because the service automatically times out anyway.
                            partialResultsTimer=null;
                            cancelSR();
                        } catch (InterruptedException e) {

                        }
                        return null;
                    }
                }.execute();
            }

            userSpokeAt = System.currentTimeMillis();
            if (!isUserSpeaking) {
                Log.d(TAG, "User started speaking");
                isUserSpeaking = true;
                if (userStoppedSpeakingAt != -1) {
                    long gap = userSpokeAt - userStoppedSpeakingAt;
                    Log.d(TAG, "User spoke after " + gap + " millis");
                }
                userStoppedSpeakingAt = -1;
                if (timeoutTaskRunner != null) {
                    Log.d(TAG, "Speech Recognition timer canceling");
                    timeoutTaskRunner.cancel();
                    timerRunning = false;
                }
                Message message = new Message();
                message.what = ASRService.MSG_RECOGNIZER_USER_SPEAKING_STATE_CHANGED;
                message.arg1 = 1; //1 means true
                sendMessageToClients(message);
            }


        } else if (isUserSpeaking) {
            long currentTimeMillis = System.currentTimeMillis();
            if (currentTimeMillis - userSpokeAt > 1700) {
                isUserSpeaking = false;
                Log.d(TAG, "User isn't speaking after: " + (currentTimeMillis - userSpokeAt));
                userStoppedSpeakingAt = currentTimeMillis;
                startTimer();
                Message message = new Message();
                message.what = ASRService.MSG_RECOGNIZER_USER_SPEAKING_STATE_CHANGED;
                message.arg1 = 0; //0 means false
                sendMessageToClients(message);
            }
        }

    }


}

@Override
public IBinder onBind(Intent arg0) {
    Log.d("ASRService", "onBind");  //$NON-NLS-1$
    return mServerMessenger.getBinder();
} }
Katakam Nikhil
  • 1,375
  • 4
  • 14
  • 22
  • 2
    Assuming your recognition provider is Google, are your issues reflected in this post http://stackoverflow.com/q/38150312/1256219 ? Otherwise, do they occur if you install the latest Google Now beta? – brandall Sep 19 '16 at 22:34
  • How does installing google now have to do with replicating the same issue? If i do install the google now beta, how would that alter my recognition issue? Am a newbie to this speech recognition stuff. please don't mind if i'm asking simple questions – Katakam Nikhil Sep 22 '16 at 17:19
  • To answer your question, the issue is not reflected in the post. My problem is that after speaking, the speech recognition is immediately stopping. It is not giving me any time for pause and speaking again. I am seeing this specifically on 2 different samsung devices and am not able to replicate the same on a Moto G4 – Katakam Nikhil Sep 22 '16 at 17:26
  • 1
    The Android SpeechRecognizer API requires a recognition service to function. When you install Google Now, it will offer this functionality. You can set it as the device default in the Android Voice Recognition Settings. I assumed you were using Google as your default? Samsung devices offer Samung/Vlingo as a Recognition Service, but annoyingly, it does not allow external applications to use it, only S-Voice. Make sure that the Samsung devices do not have Samsung/Vlingo set as the device default. – brandall Sep 22 '16 at 17:51
  • 1
    If I was to release the same app on the play store for example, then i cant really control what they do. right? so would there be a workaround for this? also, where do I find out the device default on samsung galaxy? – Katakam Nikhil Sep 22 '16 at 18:01
  • There is a constructor that can include the package name of the recognition service you want to use, which overrides the device default. Prior to doing this, you'd use a standard method to check that Google Now was installed, via PackageManager and its package name - If it's not installed, raise a notification to the user. Alternatively, 'embed' another provider in your app, such as Nuance, iSpeech or Microsoft. There are Android SDKs for them. However, these are paid services. – brandall Sep 22 '16 at 18:05
  • 1
    Check this post, which covers your concerns http://stackoverflow.com/q/37856993/1256219 – brandall Sep 22 '16 at 18:07
  • The problem i'm facing with the samsung devices is that my code is actually reducing the wait time after the user stops speaking. Is there something I am doing wrong in my code that is causing such a thing? From what you're saying, if there are any values that i'm messing up with my code for the speech recognizer, those values are not influencing google now. but they are influencing samsung. – Katakam Nikhil Sep 22 '16 at 18:13
  • 1
    Check which versions of Google Now are installed - the 'wait time' was broken and changes between version as detailed in the original link posted. You need to make sure all of the devices are running the same version before you can draw comparisons - this includes 'arm' or other architecture. Then you'll know if you have a bug, specific to Samsung devices, or not. I can't replicate on my test Samsung devices, running the latest Google Now beta – brandall Sep 22 '16 at 18:16
  • The difference between your bug and mine is that the recognition continues indefinitely if you don't stop. Mine set's the wait time to practically zero after you are done speaking. This happens only after i speak so it's a totally different issue i think. As far as the google version goes, i have a 6.3 now Would have to test the beta as I have signed up for it. But if there is some other user, i cant really ask the user to sign up for a beta to make my app work so that's where the problem lies for me – Katakam Nikhil Sep 22 '16 at 19:01
  • 1
    If you discover it's a bug, any bug, by testing many permutations, you can't fix it, all you can do is report it. You can then decide how to deal with your users who suffer from the bug. Such bugs have been ongoing for years with Google's implementation. If you want to avoid this, use an alternative paid service. I'm afraid that is what it boils down to..... All I can confirm, is that with my many Samsung test devices, I cannot replicate your issue running the latest Google Now beta, which will soon be the release version. – brandall Sep 22 '16 at 19:15
  • Thank you very much for your input @brandall it really means a lot to me. I will update on the thread of future developments. – Katakam Nikhil Sep 22 '16 at 19:17
  • I wish you luck :) – brandall Sep 22 '16 at 19:18
  • @brandall another thing I noticed is that the Speech recognition works perfectly for other applications but when my app is run and speech recognition is used, I see the problem and then when I run any other application, the same problem occurs on other applications also. This would mean that the settings are being altered by my application and this is happening ONLY for samsung phones. Could you shed some light on this please? – Katakam Nikhil Sep 22 '16 at 22:28
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/124062/discussion-between-brandall-and-katakam-nikhil). – brandall Sep 23 '16 at 17:40

0 Answers0