Streaming Audio in FLAC or AMR_WB to the Google Speech API

Question

I need to run the google speech api in somewhat low bandwidth environments.

Based on reading about best practices, it seems my best bet is to use the AMR_WB format.

However, the following code produces no exceptions, and I get no responses in the onError(t: Throwable) method, but the API is not returning any values at all in the onNext(value: StreamingRecognizeResponse) method.

If I change the format in .setEncoding() from FLAC or AMR_WB back to LINEAR16 everything works fine.

AudioEmitter.kt

fun start(
            encoding: Int = AudioFormat.ENCODING_PCM_16BIT,
            channel: Int = AudioFormat.CHANNEL_IN_MONO,
            sampleRate: Int = 16000,
            subscriber: (ByteString) -> Unit
    )

MainActivity.kt

builder.streamingConfig = StreamingRecognitionConfig.newBuilder()
        .setConfig(RecognitionConfig.newBuilder()
                .setLanguageCode("en-US")
                .setEncoding(RecognitionConfig.AudioEncoding.AMR_WB)
                .setSampleRateHertz(16000)
                .build())
        .setInterimResults(true)
        .setSingleUtterance(false)
        .build()

I think the problem may come from your `sampleRate` of `AudioEmitter`. Try to set it to 44100, 22050 or 11025 when the encoding type in streaming recognition is `FLAC`. — aminography, Nov 01 '18 at 14:12
Maybe you can follow this official troubleshooting procedure? https://cloud.google.com/speech-to-text/docs/support#troubleshooting to define where the issue comes from. — Bsquare ℬℬ, Nov 05 '18 at 13:30
@aminography I've messed with those settings, unfortunately it didn't help. — Wesley, Nov 05 '18 at 17:34
@Bsquare Looked at those many times. Have tried every possible combination of settings I can find, and still no luck. It looks like both here and on the cloud-speech-discuss forum the team is completely disengaged. — Wesley, Nov 05 '18 at 17:42
Did you try converting your sound file in FLAC or something else, just to check if it is a key in your issue? — Bsquare ℬℬ, Nov 07 '18 at 08:47

score 0 · Answer 1 · answered Nov 06 '18 at 03:58

Google won't recognize your data because you tell it the data is in FLAC or AMR_WB format, while you keep passing raw, uncompressed audio chunks that AudioRecord.read() produces.

Now, in order to make it work you have two choices. The first is to convert the data to the required format yourself, possibly using some third-party library. The second one is to use MediaRecorder from the Android library. Unfortunately, it supports only writing to a file-like destination, so you cannot simply replace AudioRecorder with it, but there's a workaround described in this answer.

Streaming Audio in FLAC or AMR_WB to the Google Speech API

1 Answers1