Mediastream pipe to NodeJS socket.io stream to Google Speech API and stream back the responses

Question

I want to implement speech to text using Google Speech API, but in my frontend I don't quite get what should I do, I am using Socket.io Stream in both backend and frontend.

Frontend (Javascript)

bindSendAudioMessage() {
    let me = this;

    me.sendAudioMessageButton = me.ele.find('#send-audio-message-btn');

    me.sendAudioMessageButton.off('click').one('click', async function () {
        let stream = await navigator.mediaDevices.getUserMedia({ audio : true});
        me.recordingStarted(stream);
    });
},
recordingStarted: function (inputStream) {
    let serverStream = ss.createStream();
    ss(chatBox.socketIO).emit('speech-to-text', serverStream);
    inputStream.pipe(serverStream);
    ss(chatBox.socketIO).on('speech-text', function (stream) {
        console.log('receiving something');
        console.log(stream);
        stream.on('data', data => {
            console.log(data);
        })
    })
},

Backend (NodeJS)

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();
SocketStream(socket).on('speech-to-text', function (inputStream) {
    console.log(inputStream);
    const request = {
        config: {
            encoding: 'LINEAR16',
            sampleRateHertz: 16000,
            languageCode: 'en-US',
        },
        interimResults: false, // If you want interim results, set this to true
        single_utterance: true,
    };

    // Create a recognize stream
    const recognizeStream = client
        .streamingRecognize(request)
        .on('error', console.error)
        .on('data', data =>
            process.stdout.write(
                data.results[0] && data.results[0].alternatives[0]
                    ? `Transcript: ${data.results[0].alternatives[0].transcript}\n`
                    : `\n\nReached transcription time limit, press Ctrl+C\n`
            )
        );

    let outputStream = SocketStream.createStream();
    SocketStream(socket).emit('speech-text', outputStream);

    // Pipe inputStream to recognizeStream then to outputStream
    inputStream.pipe(recognizeStream).pipe(outputStream);
})

I am sure there is something I'm missing in stream API, one problem I am aware of is navigator.mediaDevices.getUserMedia({ audio : true}) will get me a MediaStream which is not the same as SocketIO Stream.

How can I prepare Audio MediaStream to be able to stream it to SocketIO Stream?
How can I stream back the responses as I get them from Google API?
Does this line inputStream.pipe(recognizeStream).pipe(outputStream); make any sense?

I saw your response to my socket-io question on GitHub and thought you might appreciate this writeup as well: https://stackoverflow.com/questions/50976084/how-do-i-stream-live-audio-from-the-browser-to-google-cloud-speech-via-socket-io/50976085#50976085 — Amber B., Aug 05 '21 at 14:20

Mediastream pipe to NodeJS socket.io stream to Google Speech API and stream back the responses

0 Answers0