Accuracy of MS System.Speech.Recognizer and the SpeechRecognitionEngine

Question

I am currently testing the SpeechRecognitionEngine by loading from an xml file a pretty simple rule. In fact it is a simple between ("decrypt the email", "remove encryption") or ("encrypt the email", "add encryption").

I have trained my Windows 7 PC and additionally added the words encrypt and decrypt as I realize they are very similar. The recognizer already has a problem with making a difference between these two.

The issue I am having is that it recognizes things too often. I have set the confidence to 0.93 because with my voice in a quiet room when saying the exact words sometimes only gets to 0.93. But then if I turn on the radio the voice of the announcer or a song can mean that this recognizer thinks it has heard with over 0.93 confidence with words "decrpyt the email".

Maybe Lady Gaga is backmasking Applause to secretly decrypt emails :-)

Can anyone help in working out how to do something to make this recognizer workable.

In fact the recognizer is also picking up keyboard noise as "decrypt the email". I don't understand how this is possible.

Further to my editing buddy there are at least two managed namespaces for MS Speech Microsoft.Speech and System.Speech - It is important for this question that it be know that it is System.Speech.

This is all rather normal. You didn't say anything about the microphone you used, it [can be critical](http://www.speechrecsolutions.com/microphone_selection_guide.htm) — Hans Passant, Sep 16 '13 at 12:14
I am using the mic from the Polycom cx100 http://www.polycom.com/products-services/products-for-microsoft/lync-optimized/cx100-desktop-phone.html. I trained the desktop engine and also did dictation on notepad of the words and my accuracy improved, but now it recognizes text when I am just typing. — darbid, Sep 16 '13 at 14:37
Switch to a headset microphone. Speakerphones are notorious for picking up extraneous noise. — Eric Brown, Sep 17 '13 at 05:31
ok noted. This is a cool device but I realize that whilst hands free is good for talking on the phone or communicator it might not be so good for speech recognition. — darbid, Sep 17 '13 at 12:32
@darbid - One of the fun things about SR is that engine confidence != accuracy. I.e., the engine can be very confident about a reco, but it will still be wrong. Conversely, the engine can have very low confidence in a reco, and it will still be correct. In practice, I never use the confidence values (aside from it being high enough to pass the rejection threshold). — Eric Brown, Sep 17 '13 at 15:39

score 13 · Accepted Answer · answered Sep 17 '13 at 05:30

13

If the only thing the System.Speech recognizer is listening for is "encrypt the email", then the recognizer will generate lots of false positives. (Particularly in a noisy environment.) If you add a DictationGrammar (particularly a pronunciation grammar) in parallel, the DictationGrammar will pick up the noise, and you can check the (e.g.) name of the grammar in the event handler to discard the bogus recognitions.

A (subset) example:

    static void Main(string[] args)
    {
        Choices gb = new Choices();
        gb.Add("encrypt the document");
        gb.Add("decrypt the document");
        Grammar commands = new Grammar(gb);
        commands.Name = "commands";
        DictationGrammar dg = new DictationGrammar("grammar:dictation#pronunciation");
        dg.Name = "Random";
        using (SpeechRecognitionEngine recoEngine = new SpeechRecognitionEngine(new CultureInfo("en-US")))
        {
        recoEngine.SetInputToDefaultAudioDevice();
        recoEngine.LoadGrammar(commands);
        recoEngine.LoadGrammar(dg);
        recoEngine.RecognizeCompleted += recoEngine_RecognizeCompleted;
        recoEngine.RecognizeAsync();

        System.Console.ReadKey(true);
        recoEngine.RecognizeAsyncStop();
        }
    }

    static void recoEngine_RecognizeCompleted(object sender, RecognizeCompletedEventArgs e)
    {
        if (e.Result.Grammar.Name != "Random")
        {
            System.Console.WriteLine(e.Result.Text);
        }
    }

answered Sep 17 '13 at 05:30

Eric Brown

13,774
7
30
71

Thank you so much for the suggestion, I am going to try this is out and then get back to you, but it sounds like a great idea. I am using an XML file and a rule with many more words or phrases, "encrypt the Document" was just one. But I still think your suggestion will work. – darbid Sep 17 '13 at 12:31
Yes that vastly improves things. The recognition is now not guessing. To anyone coming here because of a similar situation I think this is a must to get things working or in my case working/recognizing less. Thank you very much. I have come accross your blog and will ask a new question on one of your articles. – darbid Sep 17 '13 at 15:51
I am using System.Speech.Recognition and followed your suggestion to reduce the false positives, it works great. However, now I am experiencing a TargetInvocationException after like 20 minutes of recognition. I want to try Microsoft.Speech.Recognition but there is no DictationGrammar class. Is there an equivalent to DictationGrammar within Microsoft.Speech.Recognition? – DiegoSahagun Apr 03 '14 at 16:06
@DiegoSahagun No. Microsoft.Speech.Recognition uses a different SR engine [that does not support dictation](http://stackoverflow.com/questions/2977338/what-is-the-difference-between-system-speech-recognition-and-microsoft-speech-re/2982910#2982910). – Eric Brown Apr 03 '14 at 16:18
Thanks @EricBrown, do you know if there is a way to reduce Microsoft.Speech.Recognition's false positives? maybe I should create a question for my problem with the desktop version, I haven't been able to find anythig related yet. – DiegoSahagun Apr 03 '14 at 22:07
You can try a [Garbage reference](http://msdn.microsoft.com/en-us/library/system.speech.recognition.srgsgrammar.srgsruleref.garbage(v=vs.110).aspx). I haven't tried this, myself. – Eric Brown Apr 03 '14 at 23:23

Accuracy of MS System.Speech.Recognizer and the SpeechRecognitionEngine

1 Answers1

Linked