I've been chasing my tail for two days figuring out how to best approach sending the <Buffer ... >
object generated by Google's Text-To-Speech service, from my express-api to my React app. I've come across tons of different opinionated resources that point me in different directions and only potentially "solve" isolated parts of the bigger process. At the end of all of this, while I've learned a lot more about ArrayBuffer
, Buffer
, binary arrays, etc. yet I still feel just as lost as before in regards to implementation.
At its simplest, all I aim to do is provide one or more strings of text to tts, generate the audio files, send the audio files from my express-api to my react client, and then automatically play the audio in the background on the browser when appropriate.
I am successfully sending and triggering google's tts to generate the audio files. It responds with a <Buffer ...>
representing the binary data of the file. It arrives in my express-api endpoint, from there I'm not sure if I should
...
- convert the
Buffer
to astring
and send it to the browser? - send it as a
Buffer
object to the browser? - set up a websocket using socket.io and stream it?
then once it's on the browser,
- do I use an
<audio />
tag? - should I convert it to something else?
I suppose the problem I'm having is trying to find answers for this results in an information overload consisting of various different answers that have been written over the past 10 years using different approaches and technologies. I really don't know where one starts and the next ends, what's a bad practice, what's a best practice, and moreover what is actually suitable for my case. I could really use some guidance here.
Synthesise function from Google
// returns: <Buffer ff f3 44 c4 ... />
const synthesizeSentence = async (sentence) => {
const request = {
input: { text: sentence },
voice: { languageCode: "en-US", ssmlGender: "NEUTRAL" },
audioConfig: { audioEncoding: "MP3" },
};
const response = await client.synthesizeSpeech(request);
return response[0].audioContent;
};
(current shape) of express-api POST endpoint
app.post("/generate-story-support", async (req, res) => {
try {
// ? generating the post here for simplicity, eventually the client
// ? would dictate the sentences to send ...
const ttsResponse: any = await axios.post("http://localhost:8060/", {
sentences: SAMPLE_SENTENCES,
});
// a resource said to send the response as a string and then convert
// it on the client to an Array buffer? -- no idea if this is a good practice
return res.status(201).send(ttsResponse.data[0].data.toString());
} catch (error) {
console.log("error", error);
return res.status(400).send(`Error: ${error}`);
}
});
react client
useEffect(() => {
const fetchData = async () => {
const data = await axios.post(
"http://localhost:8000/generate-story-support"
);
// converting it to an ArrayBuffer per another so post
const encoder = new TextEncoder();
const encodedData = encoder.encode(data.data);
setAudio(encodedData);
return data.data;
};
fetchData();
}, []);
// no idea what to do from here, if this is even the right path :/