How can I handle two different languages simultaneously in a Speech-to-Text app (in C#)?

Question

I would like to make a Speech-to-text-to-translation utility that can recognize and render both English and Spanish "on the fly." To start with, I need it to be able to process two languages (the translation piece of it I'll postpone until later).

IOW, I want it to be able to process (through the device's speaker) conversations such as:

Spanish speaker's voice captured and rendered: "Que estas haciendo?"

English speaker's voice captured and rendered: "I don't speak Spanish, or Italian, or whatever lingo that is. Speak English!"

Spanish speaker: "I asked you what you're doing."

English speaker: "Oh, not much really; I mean, none of your gol-durned business!"

(etc.)

I see here that I can set up a speech-to-text session like so:

using Microsoft.Speech.Recognition;
using Microsoft.Speech.Synthesis;

namespace ConsoleSpeech
{
  class ConsoleSpeechProgram
  {
    static SpeechSynthesizer ss = new SpeechSynthesizer();
    static SpeechRecognitionEngine sre;

    static void Main(string[] args)
    {
      try
      {
        CultureInfo ci = new CultureInfo("en-us");
        sre = new SpeechRecognitionEngine(ci);
        sre.SetInputToDefaultAudioDevice();
        sre.SpeechRecognized += sre_SpeechRecognized;
        . . .

static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    string txt = e.Result.Text;
    float confidence = e.Result.Confidence;
    Console.WriteLine("\nRecognized: " + txt);
    if (confidence < 0.60) return;
    . . .

Since the CultureInfo class is instantiated with a specific language (US English shown above), I guess it would render "Que estas haciendo?" as something like "Kay is toss hossy end, oh?" and therefore have a very low Result.Confidence value.

Is there a way to simultaneously respond to two languages, such as by instantiating two CultureInfo classes:

CultureInfo ciEnglish = new CultureInfo("en-us");
CultureInfo ciSpanish = new CultureInfo("es-mx");

Even if that is doable, would the two classes be "willing" to share the microphone and be smart enough to cede to the other when they don't understand what is being spoken?

I'm fearful that this is going to be one of those "hard" (read "pretty much impossible") challenges. If I'm wrong in that, please let me know, though.

In the answer by Bulltorious here, it would seem that possibly a "SpeechRecognized" event could try to determine the language being spoken, but not enough code is shown to see whether that is really so.

Have you tried just instantiating two SpeechRecognizerEngines? Not sure if they can both get audio from the microphone, but if not you could try to make an audio stream they can both share. — smead, Apr 13 '16 at 00:35
That's what I show above, but how to do that, if possible, is the question. How would they know when to "share" the vocalization transformation? — B. Clay Shannon-B. Crow Raven, Apr 13 '16 at 01:29

How can I handle two different languages simultaneously in a Speech-to-Text app (in C#)?

0 Answers0