6

I started using google speech api to transcribe audio.

The audio being transcribed contains many numbers spoken one after the other.

E.g. 273 298

But the transcription comes back 270-3298

My guess is that it is interpreting it as some sort of phone number.

What i want is unparsed output e.g. "two seventy three two ninety eight' which i can deal with and parse on my own.

Is there a setting or support for this kind of thing?

thanks

Michelle Wetzler
  • 734
  • 4
  • 16
Moshe Rayman
  • 61
  • 1
  • 3
  • Are you requesting more than one alternative? If so, do any of the others get the transcription correct? – brandall Oct 06 '16 at 19:28
  • i get 10 alternatives, and all of them have the number formated as a phone number – Moshe Rayman Oct 09 '16 at 13:34
  • I'm having a similar problem. Application asks users to enter a 9 digit card number, Google thinks the user is trying to say a phone number so it pads the results with an extra digit at the end or even the middle of a number. – Sam Aug 05 '18 at 17:24
  • Related https://stackoverflow.com/questions/55525503/api-or-sdk-to-make-speech-recognition-only-for-numbers-between-1-and-10000/ – Nikolay Shmyrev Apr 05 '19 at 08:31
  • Try IBM's SR service, which provides a "smart_format" option to tweak whether return the original transcripts or "formatted" one – dy.octa Jun 12 '19 at 07:59

4 Answers4

5

So I had this exact same problem and I think we found a solution. If you're using English as input, switch to en-PH just when working with numbers. Google will then not format the result as a U.S. phone number or try to stick an extra digit in there.

Sam
  • 144
  • 2
  • 6
2

Try passing a speech context with some phrase hints. How to use it is documented here: https://cloud.google.com/speech/docs/basics#phrase-hints

Give it the spelled out numbers that you want recognized.

"speech_context": {
  "phrases":["zero", "one", "two", ... "nine", "ten", "eleven", ... "twenty", "thirty,..., "ninety"]
 }

This isn't guaranteed to work, but it may help.

blambert
  • 1,340
  • 13
  • 19
1

For the record, I tried blambert's solution above and it doesn't work, unfortunately. I posted another question recently seeing if anyone has found a way to defeat this behavior, as it is preventing me from implementing a transcription service that I had planned.

justishar
  • 73
  • 7
  • 1
    For the record, Amazon's speech recognition software doesn't seem to format numbers into phone numbers. I may have to take a look at that again if I can't work around this with Google. – justishar May 14 '18 at 15:37
0

Have you tried Google Speech customClass?

You have some class tokens that you could use, telling the API that you are not expecting a phone number but a different type of numbers.

For instance, if you choose to use OOV_CLASS_AM_RADIO_FREQUENCY, you'll indicate the API to interpret numbers like this:

  • "twelve twenty" --> 1220
  • "seven hundred and thirty" --> 730

Probably (haven't read this) the API is using this class FULLPHONENUM by defaut for numbers:

  • "one eight hundred five five five four oh oh one" --> +1-800-555-4001
  • "seven one eight five five five six one oh one" --> 718-555-6101
Giuseppe
  • 464
  • 13
  • 24