Connecting AnalyserNode to SpeechSynthesisUtterance

Question

I'm using the "SpeechSynthesisUtterance" to generate sound from text, I wonder if it is possible to connect "AnalyserNode" to the output audio stream, so I can create voice viz to the output of the text to speech web API in javascript or typescript.

score 3 · Answer 1 · answered Mar 18 '23 at 19:14

Unfortunately there is no easy way to do this.

The only possible workaround right now would be to record the audio of the current tab with getDisplayMedia() but that requires a user interaction and you have to rely on the user to pick the correct tab.

There was once a meanwhile closed issue for the Web Audio API to enable this.

There are still two open issues on the Web Speech API repo about this problem. They are about getting the synthesized speech as audio data or as a MediaStreamTrack. Maybe it's a good idea to add your use case to one of those issues. It's hopefully resuming the discussion.

thanks for your comment, was hoping in 2023, and the revolution of chatGPT to have this as built in feature in the Web Audio API ready, but what to say. I'll follow the other thread, just in case they come up with new updates. — mustafa.salaheldin, Mar 19 '23 at 06:01

score 0 · Answer 2 · answered Mar 23 '23 at 09:27

Thanks @chrisguttandin I've created something like this:

        utterance.onstart = (event) => {

            console.log(event.currentTarget);

            navigator.mediaDevices.enumerateDevices()
                // set `getUserMedia()` constraints to "auidooutput", where avaialable
                // see https://bugzilla.mozilla.org/show_bug.cgi?id=934425, https://stackoverflow.com/q/33761770

                .then(devices => {
                    const audiooutput = devices.find(device => device.kind === "audiooutput" && device.deviceId === "default");

                    if (audiooutput) {
                        const constraints = {
                            'audio': true,
                            deviceId: {
                                exact: audiooutput.deviceId
                            }
                        };

                        navigator.mediaDevices.getUserMedia({
                            audio: constraints
                        }).then((stream: MediaStream) => {

                            let equalizer = new Equalizer(stream);

                            console.log('stream.active: ', stream.getAudioTracks().length);
                        });
                    }
                });
        };

but the stream is empty, any advice, what is exactly the straem is returning here? My goal is to capture the sound is going out from the speakers. I don' want to use MediaDevices.getDisplayMedia()

Update: I found out that when I retrive the deviceId, it returns ‘default’, when I use this value in getusermedia, it picks the other input device of id ‘default’. — mustafa.salaheldin, Mar 24 '23 at 20:48
I forced the code to return the deviceId of my headset, but hen I used it in the constraints for the getusermedia, the code broke. — mustafa.salaheldin, Mar 24 '23 at 20:50

Connecting AnalyserNode to SpeechSynthesisUtterance

2 Answers2