1

I am currently trying to use Google Speech API to do a live speech to text transcription in a web application. In order to do that I have to use the RPC streaming recognition (web sockets). I know there are multiple client libraries, but none of them gives the possibility to stream the audio directly from the web app to the Google Speech API. No plain javascript libraries.

I also know it probably is possible to do this by setting up a web socket connection between the front-end and the backend, and then, in my case, use the NodeJS client library to stream to the Google Speech API. However, this seems to be unnecessary complex.

Is there really no supported way of using the streaming recognition directly from a web app?

Does anyone know how this could be done?

EDIT I havent gotten as far as actually sending a stream to the service, which is the baseline of my question. Let me rephrase my question: Is there a way to send an audio stream to the Google Speech API directly from the browser/microphone? My app is created in JavaScript (Angular).

I've used IBM Watson S2T before, and they deliver a JavaScript SDK available through bower that can transcript audio from microphone directly to the service without passing it through a backend layer.

Regards,

Kjetil

kaamodt
  • 300
  • 2
  • 11
  • What issues are you having streaming the audio data to the API? What does the API expect the data to be streamed as? An `ArrayBuffer`, `FormData` or `File` object representation of an audio file? An active `MediaStreamTrack`? – guest271314 Nov 01 '17 at 22:23
  • See [How can I extract the preceding audio (from microphone) as a buffer when silence is detected (JS)?](https://stackoverflow.com/questions/46543341/how-can-i-extract-the-preceding-audio-from-microphone-as-a-buffer-when-silence) – guest271314 Nov 01 '17 at 22:28
  • Thanks for your reply! I edited my original question to better reflect what I actually am looking for. – kaamodt Nov 02 '17 at 08:44
  • You still have not answered the questions at previous comment – guest271314 Nov 02 '17 at 13:54
  • Well, what I am asking for is how to use the RPC API for [StreamingRecognize](https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1#google.cloud.speech.v1.Speech.StreamingRecognize). The request seems to be defined [here](https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1#google.cloud.speech.v1.StreamingRecognizeRequest). However, I was hoping there was an existing wrapper/library (for Angular) out there which makes it so I do not have to implement this integration code myself. I am not sure if this answers your question, but its the best I can do :( – kaamodt Nov 02 '17 at 17:30
  • You can send the audio data as an `ArrayBuffer`. Have not tried Angular and not sure how Angular is related to Question? – guest271314 Nov 02 '17 at 17:33
  • 1
    I do not either believe it is relevant, but at does not hurt to be specific :) So, basically you are saying I have to create the integration code myself. There is no existing library or similar that wraps this code and expose the methods more easy. If that is true: That is the URI for calling the Google Speech API RPC for StreamingRecognize? – kaamodt Nov 02 '17 at 17:41
  • What exactly is the issue? – guest271314 Nov 02 '17 at 17:41
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/158097/discussion-between-kaamodt-and-guest271314). – kaamodt Nov 02 '17 at 17:53

0 Answers0