Multi-level Speech Commands?

Question

I'm trying to do a Multi-level command speech, but I'm having some trouble with it...

I want to have master commands like: "TV", "Light", "water" etc

When i say "TV", for example, i want to have sub commands for the desired action, like:

When i say "TV": sub commands -> "Volume up", "Volume down", "power off", "power on"

I'll get recognized only commands for TV, until i say "TV done". Then I get back to the master command's list.

How can I do that?

My actual code is this:

 class Program
    {
        static Dictionary<string, string> listaCanais = new Dictionary<string, string>()
            {
                { "Fox News", "0 6 0" },
                { "The Weather Channel", "0 3 1"},
                { "Spike", "0 3 8"}
            };
        static void Main(string[] args)
        {
            using (var sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("pt-BR")))
            {
                sre.SetInputToDefaultAudioDevice();

                sre.LoadGrammarAsync(Comandos());

                sre.RequestRecognizerUpdate();
                sre.SpeechRecognitionRejected += sre_SpeechRecognitionRejected;
                sre.SpeechRecognized += sre_SpeechRecognized;

                sre.RecognizeAsync(RecognizeMode.Multiple);

                Console.ReadLine();
            }
        }

        public static Grammar Comandos()
        {
            Choices numerosTV = new Choices("zero", "um", "dois", "três", "quatro", "cinco", "seis", "sete", "oito", "nove", "10", "11", "12");

            GrammarBuilder fraseNumeroTV = new GrammarBuilder(numerosTV);

            GrammarBuilder fraseMudarCanal = new GrammarBuilder("TV, canal");
            fraseMudarCanal.Append(numerosTV);
            //fraseMudarCanal.Append(numerosTV);
            //fraseMudarCanal.Append(numerosTV);

            Choices nomeCanal = new Choices();

                      foreach (string key in listaCanais.Keys)
            {
                nomeCanal.Add(key);
            }

            GrammarBuilder fraseNomeCanal = new GrammarBuilder("TV, canal");
            fraseNomeCanal.Append(nomeCanal);

            GrammarBuilder fraseMudo = new GrammarBuilder("TV, silencioso");
            GrammarBuilder fraseLigar = new GrammarBuilder("TV, ligar");
            GrammarBuilder fraseFecharApp = new GrammarBuilder("Controle, fechar aplicativo");
            GrammarBuilder frasePauseComandoVoz = new GrammarBuilder("Controle, pausar comando de voz");
            GrammarBuilder fraseIniciarComandoVoz = new GrammarBuilder("Controle, ativar comando de voz");

            Choices opcoesPrincipais = new Choices(new GrammarBuilder[] {fraseMudarCanal,
                                                                         fraseNomeCanal,
                                                                         fraseMudo,
                                                                         fraseLigar,
                                                                         fraseFecharApp,
                                                                         frasePauseComandoVoz,
                                                                         fraseIniciarComandoVoz});

            Grammar resultado = new Grammar((GrammarBuilder)opcoesPrincipais);
            return resultado;

        }

        static void sre_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
        {
            Console.WriteLine("Ignorado");

        }

        // Create a simple handler for the SpeechRecognized event.
        static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("Ouvi");

            if (e.Result == null)
                return;

            Console.WriteLine("[ " + e.Result.Confidence + " ]" + "Reconhecido: " + e.Result.Text);
        }
    }

Thanks in advance.

score 0 · Accepted Answer · edited May 23 '17 at 10:26

Such things are easy to implement within state machine framework. You define the state of the recognizer and act based on it, the main work is done in onRecognitionResult method:

states = {INPUT, TV, WATER};

inputGrammar = createInputGrammar();
tvGrammar = createTvGrammar();
waterGrammar = createWaterGrammar();

state = INPUT;

vodi onRecognitionResult() {

   if (state == INPUT) {
        if (result == "TV") {
            state = TV;
            recognizer.loadGrammar(tvGrammar);
        }
   }
   if (state == TV) {
        if (result == "Volume UP") {
            raiseVolume();
        }
        if (result == "TV done") {
            state = INPUT;
            recognizer.loadGrammar(inputGrammar);
        }
    }
    // Restart recognition
    recognizer.recognizeAsync();
}

You can read Simple state machine example in C#? for more information.

Multi-level Speech Commands?

1 Answers1