I want to implement speech to text using Google Speech API, but in my frontend I don't quite get what should I do, I am using Socket.io Stream in both backend and frontend.
Frontend (Javascript)
bindSendAudioMessage() {
let me = this;
me.sendAudioMessageButton = me.ele.find('#send-audio-message-btn');
me.sendAudioMessageButton.off('click').one('click', async function () {
let stream = await navigator.mediaDevices.getUserMedia({ audio : true});
me.recordingStarted(stream);
});
},
recordingStarted: function (inputStream) {
let serverStream = ss.createStream();
ss(chatBox.socketIO).emit('speech-to-text', serverStream);
inputStream.pipe(serverStream);
ss(chatBox.socketIO).on('speech-text', function (stream) {
console.log('receiving something');
console.log(stream);
stream.on('data', data => {
console.log(data);
})
})
},
Backend (NodeJS)
// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');
// Creates a client
const client = new speech.SpeechClient();
SocketStream(socket).on('speech-to-text', function (inputStream) {
console.log(inputStream);
const request = {
config: {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'en-US',
},
interimResults: false, // If you want interim results, set this to true
single_utterance: true,
};
// Create a recognize stream
const recognizeStream = client
.streamingRecognize(request)
.on('error', console.error)
.on('data', data =>
process.stdout.write(
data.results[0] && data.results[0].alternatives[0]
? `Transcript: ${data.results[0].alternatives[0].transcript}\n`
: `\n\nReached transcription time limit, press Ctrl+C\n`
)
);
let outputStream = SocketStream.createStream();
SocketStream(socket).emit('speech-text', outputStream);
// Pipe inputStream to recognizeStream then to outputStream
inputStream.pipe(recognizeStream).pipe(outputStream);
})
I am sure there is something I'm missing in stream API, one problem I am aware of is navigator.mediaDevices.getUserMedia({ audio : true})
will get me a MediaStream
which is not the same as SocketIO Stream
.
How can I prepare Audio
MediaStream
to be able to stream it toSocketIO Stream
?How can I stream back the responses as I get them from Google API?
Does this line
inputStream.pipe(recognizeStream).pipe(outputStream);
make any sense?