30

I am developing a prototype speech to text captioning application for a University project. I am going to be using gesture recognition within my project late on, so I thought it would be a good idea to use the Kinect as the microphone source, rather than using an additional microphone. The idea of my application is to recognize spontaneous speeches such as long and complex sentences (I understand it won’t that the speech dictation will not be perfect however). I have seen many Kinect speech samples where it makes a reference to Microsoft.Speech, but not System.Speech. As I need to train the speech engine and load a DictationGrammar into the Speech Recognition Engine, Microsoft.Speech is the only option for me.

I have managed to get it working while using the Kinect as the direct microphone audio source, but since I am loading the Kinect for the video preview and gesture recognition, I am unable to access it as a direct microphone.

This is code accessing the microphone directly without loading the Kinect hardware for gesture, etc, and works perfectly:

private void InitializeSpeech()
{
    var speechRecognitionEngine = new SpeechRecognitionEngine();
    speechRecognitionEngine.SetInputToDefaultAudioDevice();
    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);
}

And this is where I need to access the access source via the Kinect once it has been loaded, which isn't doing anything at all. This I want to be doing:

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    var recognizerInfo = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(recognizerInfo.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

So the question is, is it even possible to use System.Speech instead of Microsoft.Speech with the current Kinect SDK, and what am I doing wrong in the 2nd code sample?

GetKinectRecognizer Method

private static RecognizerInfo GetKinectRecognizer()
{
    Func<RecognizerInfo, bool> matchingFunc = r =>
    {
        string value;
        r.AdditionalInfo.TryGetValue("Kinect", out value);
        return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase);
    };

    return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault();
}
Kolky
  • 2,917
  • 1
  • 21
  • 42
Daniel Clark
  • 615
  • 2
  • 9
  • 17
  • Windows recognises Kinect as a microphone input, so all speech libraries should be working fine. Have you been able to run the audio/speech samples provided with the Kinect SDK to verify the device is working OK? The above code looks fine to me, but could you post the GetKinectRecognizer method you are calling too? – LewisBenge Dec 14 '11 at 05:44
  • Hi. Apologise for the late reply. Please refer to the edit above to see the GetKinectRecognizer method I am using, which is basically the one from the Kinect samples. – Daniel Clark Dec 17 '11 at 20:34
  • @LewisBenge, did you see Dan Clark's reply? – Liam Dawson Jan 10 '12 at 01:25

2 Answers2

3

From my own experimentation, I can tell you that you can in fact use both libraries simultaneously.

Try this code instead of your current code (make sure that you add a reference to System.Speech, obviously):

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    System.Speech.Recognition.RecognizerInfo ri = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(ri.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}

Good Luck!!!

Matt Cashatt
  • 23,490
  • 28
  • 78
  • 111
  • I don't see a System.Speech, just Speech – George Birbilis Aug 27 '15 at 17:28
  • when trying SpeechBasics-WPF sample from Kinect SDK 1.8 with Visual Studio 2015, at the Quick Actions light bulb it shows next to the using clauses at MainWindows.xaml.cs, it suggests simplifying "using Microsoft.Speech.AudioFormat;" and "using Microsoft.Speech.Recognition;" to "using Speech.AudioFormat;" and "Speech.Recogniton;" respectively – George Birbilis Aug 27 '15 at 17:30
  • The code above is from Kinect Beta SDK obviously (you can't directly instantiate the audio source class at Kinect SDK v1.8 for example, need to get it from a KinectSensor object). One can find code on how to use either Microsoft.Speech or System.Speech at http://SpeechTurtle.codeplex.com (there is some conditional compilation code there to switch between the two). Note that Kinect team suggests Microsoft.Speech as more appropriate for Kinect's mic array. Also the System.Speech has fewer languages, however it supports free Dictation too apart from Voice commands that Microsoft.Speech supports – George Birbilis Oct 24 '15 at 15:07
0

Try this code with a reference to System.Speech.

using (var audioSource = new KinectAudioSource())
{
    audioSource.FeatureMode = true;
    audioSource.AutomaticGainControl = false;
    audioSource.SystemMode = SystemMode.OptibeamArrayOnly;

    System.Speech.Recognition.RecognizerInfo ri = GetKinectRecognizer();
    var speechRecognitionEngine = new SpeechRecognitionEngine(ri.Id);

    speechRecognitionEngine.LoadGrammar(new DictationGrammar());
    speechRecognitionEngine.SpeechRecognized += (s, args) => MessageBox.Show(args.Result.Text);

    using (var s = audioSource.Start())
    {
        speechRecognitionEngine.SetInputToAudioStream(s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));
        speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
    }
}