0

I'm making a speech based application in c# .NET Framework 4.0

I want to use Voice files (like .wav) as grammar instead of strings, because my app will be in a non-english language and it is hard to transfer it to english characters. For example, there will be expressions like Khorooj or Taghire 'onvan. And there will be many problems, like differences in phrase a letter, etc. So doing that will be much more easier by voice files as reference.

How do I get started? Thanx!

Mostafa Farzán
  • 899
  • 2
  • 11
  • 30

2 Answers2

2

As a variant i suggest you to use Google Voice Search (GVS).
GVS uses flac as audio format of input audio so you should use something like Cuetools to convert wave stream to flac

    public static int Wav2Flac(String wavName, string flacName)
    {
        int sampleRate = 0;

        IAudioSource audioSource = new WAVReader(wavName, null);
        AudioBuffer buff = new AudioBuffer(audioSource, 0x10000);

        FlakeWriter flakewriter = new FlakeWriter(flacName, audioSource.PCM);
        sampleRate = audioSource.PCM.SampleRate;            
        FlakeWriter audioDest = flakewriter;
        while (audioSource.Read(buff, -1) != 0)
        {
            audioDest.Write(buff);                
        }
        audioDest.Close();

        audioDest.Close();
        return sampleRate;
  }
  public static String GoogleSpeechRequest(String flacName, int sampleRate)
  {

    WebRequest request = WebRequest.Create("https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=ru-RU");

    request.Method = "POST";

    byte[] byteArray = File.ReadAllBytes(flacName);

    // Set the ContentType property of the WebRequest.
    request.ContentType = "audio/x-flac; rate=" + sampleRate; //"16000";        
    request.ContentLength = byteArray.Length;

    // Get the request stream.
    Stream dataStream = request.GetRequestStream();
    // Write the data to the request stream.
    dataStream.Write(byteArray, 0, byteArray.Length);

    dataStream.Close();

    // Get the response.
    WebResponse response = request.GetResponse();

    dataStream = response.GetResponseStream();
    // Open the stream using a StreamReader for easy access.
    StreamReader reader = new StreamReader(dataStream);
    // Read the content.
    string responseFromServer = reader.ReadToEnd();

    // Clean up the streams.
    reader.Close();
    dataStream.Close();
    response.Close();

    return responseFromServer;
  }
Ruslan F.
  • 5,498
  • 3
  • 23
  • 42
  • Thanks for your answer! It's pretty true, but unfortunately, there's a few problems. First: It needs internet, Second: If internet be available, needs VPN, because Google has closed it's apis for Iran:( No better idea? – Mostafa Farzán Sep 05 '12 at 13:00
  • I believe the Google speech API is not intended to be used by custom applications. Today, Google has not published the API nor have they described any terms of service. It is used only by Chrome browsers and Android phones. It has been reverse engineered so people have used it, but it is not truly available for custom use. For more info see http://stackoverflow.com/questions/7879804/does-anyone-uses-google-speech-api-in-production. – Michael Levy Sep 05 '12 at 14:05
  • I understand that it has been a while since this post was made, but I have a question about one of the variables used in the code. The AudioBuffer variable uses 0x10000, how is that value determined? Using the code I get an AudioBuffer format mismatch, I guess it has something to do with that value? – Hespen Aug 31 '16 at 09:02
1

You cannot use voice files as grammars. The Microsoft speech recognition engine expects grammars in a format specified by the W3C opens standards body. Grammars are not a listing of all of the words the speech recognition engine should understand. Grammars are a set of rules for an expected response to a specific dialog with the system. Another way of saying this is that grammars do not specify the language that the speech recognition system will understand. You need to get language packs and install them for the specific speech vendor you want to use. For Microsoft it can also be specific to the version of OS you are using. Here are the languages supported on Vista. You may have to go with another speech rec vendor to support the language you want, such as Nuance.

Kevin Junghans
  • 17,475
  • 4
  • 45
  • 62
  • Just adding some more info to Kevin's answer. Microsoft also offers the Microsoft Speech Platform (http://msdn.microsoft.com/en-us/library/hh361572.aspx) that can be used on servers or desktop OSes. Here is a list of supported language packs - http://www.microsoft.com/en-us/download/details.aspx?id=27224 – Michael Levy Sep 05 '12 at 14:07