Using System.Speech with Kinect

Question

I am developing a prototype speech to text captioning application for a University project. I am going to be using gesture recognition within my project late on, so I thought it would be a good idea to use the Kinect as the microphone source, rather than using an additional microphone. The idea of my application is to recognize spontaneous speeches such as long and complex sentences (I understand it won’t that the speech dictation will not be perfect however). I have seen many Kinect speech samples where it makes a reference to Microsoft.Speech, but not System.Speech. As I need to train the speech engine and load a DictationGrammar into the Speech Recognition Engine, Microsoft.Speech is the only option for me.

I have managed to get it working while using the Kinect as the direct microphone audio source, but since I am loading the Kinect for the video preview and gesture recognition, I am unable to access it as a direct microphone.

This is code accessing the microphone directly without loading the Kinect hardware for gesture, etc, and works perfectly:

private void InitializeSpeech()
{
    var speechRecognitionEngine = new SpeechRecognitionEngine();
    speechRecognitionEngine.SetInputToDefaultAudioDevice();
    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
}

And this is where I need to access the access source via the Kinect once it has been loaded, which isn't doing anything at all. This I want to be doing:

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    var recognizerInfo = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(recognizerInfo.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

So the question is, is it even possible to use System.Speech instead of Microsoft.Speech with the current Kinect SDK, and what am I doing wrong in the 2nd code sample?

GetKinectRecognizer Method

private static RecognizerInfo GetKinectRecognizer()
{
    Func<RecognizerInfo, bool> matchingFunc = r =>
    {
        string value;
        r.AdditionalInfo.TryGetValue("Kinect", out value);
        return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
    };

    return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
}

Windows recognises Kinect as a microphone input, so all speech libraries should be working fine. Have you been able to run the audio/speech samples provided with the Kinect SDK to verify the device is working OK? The above code looks fine to me, but could you post the GetKinectRecognizer method you are calling too? — LewisBenge, Dec 14 '11 at 05:44
Hi. Apologise for the late reply. Please refer to the edit above to see the GetKinectRecognizer method I am using, which is basically the one from the Kinect samples. — Daniel Clark, Dec 17 '11 at 20:34

score 3 · Answer 1 · answered Jan 21 '12 at 17:38

3

From my own experimentation, I can tell you that you can in fact use both libraries simultaneously.

Try this code instead of your current code (make sure that you add a reference to System.Speech, obviously):

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    System.Speech.Recognition.RecognizerInfo ri = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(ri.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

Good Luck!!!

answered Jan 21 '12 at 17:38

Matt Cashatt

23,490
28
78
111

I don't see a System.Speech, just Speech – George Birbilis Aug 27 '15 at 17:28
when trying SpeechBasics-WPF sample from Kinect SDK 1.8 with Visual Studio 2015, at the Quick Actions light bulb it shows next to the using clauses at MainWindows.xaml.cs, it suggests simplifying "using Microsoft.Speech.AudioFormat;" and "using Microsoft.Speech.Recognition;" to "using Speech.AudioFormat;" and "Speech.Recogniton;" respectively – George Birbilis Aug 27 '15 at 17:30
The code above is from Kinect Beta SDK obviously (you can't directly instantiate the audio source class at Kinect SDK v1.8 for example, need to get it from a KinectSensor object). One can find code on how to use either Microsoft.Speech or System.Speech at http://SpeechTurtle.codeplex.com (there is some conditional compilation code there to switch between the two). Note that Kinect team suggests Microsoft.Speech as more appropriate for Kinect's mic array. Also the System.Speech has fewer languages, however it supports free Dictation too apart from Voice commands that Microsoft.Speech supports – George Birbilis Oct 24 '15 at 15:07

score 0 · Answer 2 · answered May 27 '19 at 01:09

Try this code with a reference to System.Speech.

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    System.Speech.Recognition.RecognizerInfo ri = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(ri.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

Using System.Speech with Kinect

2 Answers2