5

I'm using REST API with cURL because I need to do something quick and simple, and I'm on a box that I can't start dumping garbage on; i.e. some thick developer SDK.

I started out base64 encoding flac files and initiating speech.syncrecognize.

That eventually failed with:

{
  "error": {
    "code": 400,
    "message": "Request payload size exceeds the limit: 10485760.",
    "status": "INVALID_ARGUMENT"
  }
}

So okay, you can't send 31,284,578 bytes in the request; have to use Cloud Storage. So, I upload the flac audio file and try again using the file now in Cloud Storage. That fails with:

{
  "error": {
    "code": 400,
    "message": "For audio inputs longer than 1 min, use the 'AsyncRecognize' method.",
    "status": "INVALID_ARGUMENT"
  }
}

Great, speech.syncrecognize doesn't like the content size; try again with speech.asyncrecognize. That fails with:

{
  "error": {
    "code": 400,
    "message": "For audio inputs longer than 1 min, please use LINEAR16 encoding.",
    "status": "INVALID_ARGUMENT"
  }
}

Okay, so speech.asyncrecognize can only do LPCM; upload the file in pcm_s16le format and try again. So finally, I get an operation handel:

{
  "name": "9174269756763138681"
}

Keep checking it, and eventually it's complete:

{
  "name": "9174269756763138681",
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
  }
}

So wait, after all that, with the result now sitting on a queue, there is no REST method to request the result? Someone please tell me that I've missed the glaringly obvious staring me right in the face, and that Google didn't create completely pointless, incomplete, REST API.

tlum
  • 913
  • 3
  • 13
  • 30
  • In your case result is simply empty it seems. It might be due to audio format mismatch, the audio must be 16khz 16bit little-endian. – Nikolay Shmyrev Jul 31 '16 at 09:54
  • The audio is either 44,100 or 48,000. I'll try downsampeling, though the documentation says: "Valid values are: 8000-48000" and advises, "use the native sample rate of the audio source (instead of re-sampling)." In the question I said I was using pcm_l16se, which should have read pcm_s16le, which is signed, 16-bit, little-endian. – tlum Jul 31 '16 at 15:23
  • For 48 you have to specify rate in asyncrecognize I believe – Nikolay Shmyrev Jul 31 '16 at 18:00
  • I did "config": { "encoding":"LINEAR16", "sample_rate": 48000, "language_code":"pt-BR" } ...but it didn't work apparently. – tlum Jul 31 '16 at 19:32
  • Related http://stackoverflow.com/questions/38906527/asyncrecognize-result-is-empty – Nikolay Shmyrev Aug 12 '16 at 02:47
  • I've successfully used REST calls with speech.syncrecognize with encoding FLAC/16000, but failed with encodings AMR_WB/16000 and AMR/8000. Docs says FLAC is recommended, but not that other encodings are not supported. Also the time to get a response is 2 x length of audio (seconds). – karpy47 Dec 07 '16 at 15:07

1 Answers1

3

So the answer to the question is, No, it is possible to use curl, to use Google Cloud Speech API, to recognize within 10 to 15 minute files... assuming you navigate and conform to a fairly tight set of constraints... at least in beta1.

What is not overtly obvious from the documentation is the result should be returned by the operations.get method... which would have been obvious had any of my attempts actually returned something other than empty results.

The source rate in my files is either 44,100 or 48,000 Hz, and I was setting sample_rate to the source native rate. However, contrary to the documentation which states:

Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).

after re-sampling to 16,000 Hz I started to get results with operations.get.

I think it's worth noting that correlation does not imply causation. After re-sampling to 16,000 Hz the files becomes significantly smaller. Thus, I can't prove it's a sample rate issue, and not just the service choking on files over a certain size.

It's also worth noting the documentation refers to the Sample Rate inconsistently. It appears that gRPC API may be expecting sample_rate, and REST API may be expecting sampleRate, according to their respective detailed definitions, in which case the Quickstart may be giving an incorrect example for the REST API.

tlum
  • 913
  • 3
  • 13
  • 30
  • According to [documentation](https://cloud.google.com/speech/reference/rest/Shared.Types/RecognitionConfig) it's `sampleRate`, not `sample_rate`, it it must be you just don't properly set the rate. – Nikolay Shmyrev Jul 31 '16 at 20:54
  • 1
    It depends on what documentation you're looking at. In the [Quick Start](https://cloud.google.com/speech/docs/getting-started) it's sample_rate. – tlum Jul 31 '16 at 21:43
  • Although Quick Start gives a REST example with cURL, [gRPC](https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1#google.cloud.speech.v1.InitialRecognizeRequest) uses sample_rate which may be the root of the discrepancy. In the [Best Practice](https://cloud.google.com/speech/docs/best-practices) it's also sample_rate, but there it's more obvious they're talking about gRPC... as long as you don't go and assume that REST is the same. That's what you get with a preview beta version. – tlum Jul 31 '16 at 21:43
  • It's also not all that required, since I never supplied sampleRate but it worked @ 16k... it should have thrown an exception. Thus, it's behaving more like an optional attribute that defaults to 16k, rather than being a required attribute with no default. – tlum Jul 31 '16 at 22:04