4

Can the Google Speech API be configured to only return numbers and letters, as opposed to full words?

The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Google may return "Em 1 Be 0 Are 3"

We have tried:

  • Using speechContexts and feeding in letters A - Z, as individual phrases. This improved the accuracy for us. We did not have much success passing in individual numbers (ex 1, 2, 3).
  • Specifying the codec and sample rate of our WAV file using the encoding and sampleRateHertz configuration options. We saw no improvement in doing this as we believe Google already does a great job of auto-recognizing the the sample rate and encoding.

Our audio file is 8000hz and encoded with "M-ULAW". We have no flexibility in changing the sample rate or encoding.

Is there a way to get a more accurate response from Google for this use case? Even ideas for better speechContexts phrases are welcome.

Thank you

Bobby Bruce
  • 341
  • 5
  • 12
  • 1
    And what is your current accuracy? – Nikolay Shmyrev Jul 25 '17 at 21:09
  • 1
    You also asked https://stackoverflow.com/questions/45312110/can-microsoft-bing-speech-be-configured-to-return-only-numbers-letters – Nikolay Shmyrev Jul 26 '17 at 17:11
  • 1
    In such a case it is better to train open source recognizer, it will be much more responsive too. – Nikolay Shmyrev Jul 26 '17 at 17:12
  • I presume you are referring to a tool such as CMUSphinx, which I see you are a developer for. I can give this a shot, as this is a greenfield project. – Bobby Bruce Jul 27 '17 at 13:29
  • Yes - I did ask the same question as I've been testing with Bing Speech as well. That question is slightly different though, as i believe Microsoft offers more granular controls, or "scenarios", to interpret speech. My current accuracy is poor - about 35% match rate. – Bobby Bruce Jul 27 '17 at 13:31
  • I gave an answer at https://stackoverflow.com/questions/45312110/can-microsoft-bing-speech-be-configured-to-return-only-numbers-letters/45360883#45360883 I'm going to flag this one as a duplicate. Go ahead and edit your first question to include more information if you want. – John Wiseman Jul 27 '17 at 22:41
  • Possible duplicate of [Can Microsoft Bing Speech be configured to return only numbers / letters?](https://stackoverflow.com/questions/45312110/can-microsoft-bing-speech-be-configured-to-return-only-numbers-letters) – John Wiseman Jul 27 '17 at 22:42
  • @JohnWiseman these are two similar questions, but discuss two very different APIs – Bobby Bruce Jul 28 '17 at 12:06
  • @BobbyBruce did you find a solution ? – Ivan Fontalvo Apr 17 '20 at 02:52

1 Answers1

1

We are experiencing the same results, we would love to have a syntax based "context" suggestion or a parameter to force only digit return variable.

Changes in api version isn't fixing the way the digits are recognised, not even using model: phone_call.

What actually was better for recognising some kind of numbers, was to switch to en_US locale and that in turn forced the recognition engine to identify a list of numbers as a phone. So it was returned in phone-like syntax with +XXX-XXX-XXX-XXXX and this made detection really really good.

So I don't understand why Google has syntax matching behind the curtains and doesn't make it available through their api.