How can I make my web browser speak programmatically?

Question

Is it possible to have a website speak a welcome message to users programmatically?

Suppose I wanted to greet users with an audio message upon successful login to my website. I know that I could record a greeting message(i.e. as an MP3), and play that, but I would want to be able to do this programmatically, since all users' names would be different.

For example, I might want to say Welcome, John Doe when John Doe logs in.

How could I do this with plain javascript?

NOTE: This is not intended for use in a production system, but rather intended to be used as a smaller portion of a bigger UX experiment.

If you just want to do this for yourself or for fun, knock yourself out, but if you are building something which faces the public, then avoid this as it is not a good user experience — Huangism, Jan 29 '18 at 17:29
Would suggest that a canonical Question/Answer the Answer should contain substantial relevant details as to the subject matter, see [What is a canonical question/answer, and what is their purpose?](https://meta.stackoverflow.com/q/291992/). This [Answer](https://stackoverflow.com/a/48504229/) currently omits critical details as to how the Web Speech API is actually implemented at different browsers. `window.speechSynthesis.getVoices()` and `onvoiceschanged` event usage cannot be ignored for interop. — guest271314, Jan 29 '18 at 19:00
@MartinBean No, it is not "awful UX", either from an accessibility [using multiple variables to open different links](https://stackoverflow.com/questions/47113870/using-multiple-variables-to-open-different-links) or automation perspective. Corp.s are not investing resources into speech synthesis technologies for no reason. — guest271314, Jan 29 '18 at 19:09
@MartinBean [CES 2017: The Year of Voice Recognition](https://spectrum.ieee.org/tech-talk/consumer-electronics/gadgets/ces-2017-the-year-of-voice-recognition) — guest271314, Jan 29 '18 at 19:15
"greet users with an audio message upon successful login" is surely "awful ux". Although there may be practical reasons to use audio this way, that is not one of them under normal circumstances. — Wesley Murch, Jan 29 '18 at 19:17
@WesleyMurch What are "normal circumstances"? Consider an individual browsing the web who might happen to be vision impaired. How would you suggest to provide a meaningful notification to that user as to a particular functionality, feature or completion of a procedure? — guest271314, Jan 29 '18 at 19:23
@Bakuriu Then you do not visit *outube for the same reason, yes? — guest271314, Jan 29 '18 at 19:27
Related: [Should I add sound effects to my web site?](https://ux.stackexchange.com/q/53641) [Why is sound sparingly used on websites?](https://ux.stackexchange.com/q/88252) [Should we use a sound/jingle when users arrive on our site or open our app?](https://ux.stackexchange.com/q/30359) — Bernhard Barker, Jan 29 '18 at 20:31
Consider this; Good UX is when the user expects what your site does. Practically no site greets you upon logging in, and users will not be expecting it. It will be jarring, especially in a public space. — BooleanCheese, Jan 29 '18 at 20:57
@grizzthedj _"NOTE: This is not intended for use in a production system, but rather intended to be used as a smaller portion of a bigger UX experiment."_ Why do you attempt to qualify the premise of your Question in response to comments where you posted an Answer to your own Question? What is the purpose of posting this Question? — guest271314, Jan 29 '18 at 21:03
@Dukeling The opinion as to "user experience" is an entirely different question than the technical capability of a browser to perform a given task. — guest271314, Jan 29 '18 at 21:07
@BooleanCheese Where does OP ask for opinions as to what "Good UX" is at the original Question? — guest271314, Jan 29 '18 at 21:11
@guest271314 I didn't post that comment as an answer. I posted it as a response to you arguing about whether it was bad UX. — BooleanCheese, Jan 29 '18 at 21:13
@BooleanCheese There is no "argument". Could care less about others' opinion. The topic of "UX" was initiated a comment, then OP, for a reason known only to themselves, edited the original Question to mention "UX". The actual Question _"How can I make my web browser talk?"_ has absolutely nothing to do with opinion as to "UX". — guest271314, Jan 29 '18 at 21:14
@Dukeling No, those Questions are not "related". OP does not ask for opinions as to "UX" at the original Question. You cannot massage a topic into the Question that does not appear at the original Question. Will note that, from perspective here, the edit of the original Question to mention "UX" does bring into question the legitimacy of the premise of the Question itself; especially as the Answer by lacks essential components necessary for interop. — guest271314, Jan 29 '18 at 21:17
@guest271314 You might want to argue with the people who posted the 9 comments above mine arguing about whether it's good UX - a single comment linking to posts where that is answered is a much cleaner solution and helps to avoid discussion in the comments (at least usually). Although, considering that 5 of those comments are yours, I do find myself a bit confused about your comment. — Bernhard Barker, Jan 29 '18 at 21:21
@Dukeling Again, there is no "argument". We are dealing with facts. Already commented directly to those users as well. Not sure why or how the topic of "UX" is relevant at all (in their opinion) to the actual Question? The fact is that OP did not post the Question at ux . stackexchange, nor ask for opinion as to "UX" at the present Question either. — guest271314, Jan 29 '18 at 21:24

guest271314 · Accepted Answer · 2018-01-29T17:47:06.323

For window.speechSynthesis.speak() to render audio output at Chromium browser the user needs to have speech-dispatcher installed and launch the browser with --enable-speech-dispatcher flag.

How to use Web Speech API at chromium?

onvoiceschanged event handler and window.speech.synthesis.getVoices() needs to be called to populate the list of available voices. The API is not straightforward; .getVoices() may need to be called twice for the SpeechSynthesisVoice objects to populate the array returned by .getVoices().

Note that there is a potential for the calls to .speak() to be placed in a queue and not be rendered as audio output, which is not immediately evident; calling window.speechSynthesis.cancel() clears the queue, where the audio output could then be rendered unexpectedly.

speechSynthesis.getVoices() is empty array in Chromium Fedora

You can then use window.speechSynthesis.speak().

Have been trying for some time now to get SSML parsing enabled by default at Chromium browser for *nix; without using an external web service which requires either some form of EUA or is not free as in beer.

The list of entities that have contacted and questions asked to achieve this is quite lengthy, for example

Firefox at *nix also does not parse SSML.

Perhaps with more interest by users at large we can finally get this feature enabled by default.

Though there are workarounds for SSML parsing without using an external web service; this first link below is still unanswered; though includes PHP code that calls the binary using shell_exec() following $_POST to a local server

Note, that there are several bug with the current Web Speech API implementation, notably that changing volume property at SpeechSynthesisUtterance has no effect on audio output at both Chromium and Firefox

There is also a bug when using .pause() and .resume(), which encountered when trying to programmatically parse <break> element of SSML

"speak speak slash" is audio output of .speak() following two calls to .speak(), .pause() and .resume()

An alternative to using the apparently dead Web Speech API is speak.js which was created by porting espeak to JavaScript or meSpeak.js, which is a fork of speak.js. espeak-ng is now actively maintained, for example using a modified version of meSpeak.js

generate audio file with W3C Web Speech API

or using online dictionaries which serve voice files reflecting the word

How to create or convert text to audio at chromium browser?

Interestingly, after posting that Answer the "gstatic" "dictionary" no longer served the audio files.

Fortunately, we have

mozilla/voice-web

This is a web, Android and iOS app for collecting speech donations for the Common Voice project.

which is quite active.

We can also use Native Message at both Chromium/Chrome and Firefox to call interact with the native shell and call the binary itself

this code achieves expected result with minimal modification using Native Messaging

Chrome Native messaging with PHP

or as a drastic measure, change the binary

How to set options of commands called by browser?

(opinion, supported by facts follow)

There is a substantial web service market for speech synthesis technologies, both in the generation thereof ( "[L]yrebird") and the recognition of - for profit i.e.g., "*lexa"; "*olly"; (*bm) "*atson *luemix"; (*oogle) "*ctions"; etc.

It is up to open source developers to continue efforts directed towards maintaining open source (FOSS; FLOSS) speech synthesis technologies at open source browsers. If we want these technologies to be implemented in browsers by default, open source developers have to compose the code to make that happen.

Very thorough. I'd like to add that Safari supports the API everywhere it runs (so yeah, "only" iOS and macOS) and unlike Chromium, does the speech synthesis offline (macOS has had the capability for decades) which enables a couple of time-sensitive features not available with Google's synthesis. — Touffy, Jan 29 '18 at 16:06
@Touffy Have not tried macOS/safari. Yes, noticed that Chromium source code has has *pple copyright for several of the files related to tts — guest271314, Jan 29 '18 at 16:10
@Touffy macOS/safari does not support SSML parsing by default, correct? — guest271314, Jan 29 '18 at 16:38
Indeed, as far as I know Safari doesn't support SSML (probably because the underlying MacOS API doesn't either) but that shouldn't be a problem for the OP's simple needs. Plain text should work fine. — Touffy, Jan 29 '18 at 21:47
@Touffy Following the edit to the original Question am not certain what the purpose the Question, nor what the requirement is. Asked you about SSML parsing at macOS/safari to confirm that the functionality is still absent at macOS/safari. — guest271314, Jan 29 '18 at 21:56

grizzthedj · Answer 2 · 2019-04-02T15:54:20.437

6

This is possible with the SpeechSynthesisUtterance interface of the Web Speech API. More info on this here.

The javascript below will say "Welcome John Doe" when executed in Chrome. Make sure the volume is up!

const message = new SpeechSynthesisUtterance('Welcome, John Doe'); 
window.speechSynthesis.speak(message);

The Web Speech API also provides a speech recognition interface. The following code will print spoken words to the browser's console.

const recognition = new webkitSpeechRecognition();
recognition.onresult = function(event) {
  for (let i = event.resultIndex; i < event.results.length; ++i) {
    console.log(event.results[i][0].transcript); 
  }
}

To start capturing speech, run recognition.start();
To stop capturing speech, run recognition.stop();

Given this is experimental technology, it is not going to be perfect, and it is not supported in all browsers and versions. Check the browser compatibility table for supported browsers and versions.

edited Apr 02 '19 at 15:54

answered Jan 29 '18 at 15:04

grizzthedj

7,131
16
42
62

The example should include `.getVoices()` call, which is not that straightforward to use. – guest271314 Jan 29 '18 at 15:24
Note, voices are loaded asynchronously. The call to `window.speechSynthesis.speak()` could occur before the `SpeechSynthesisVoice` objects have populated the array returned by `.getVoices()` – guest271314 Jan 29 '18 at 16:18
1

Are recognition and synthesis done locally or they use the internet ? – beppe9000 Jan 29 '18 at 19:21
1

@beppe9000 Locally. Chrome is shipped with their own version of voices capable of being set and used by `SpeechSynthesisUtterance`. – guest271314 Jan 29 '18 at 19:41
@guest271314 nice – beppe9000 Jan 29 '18 at 21:21
Well, just spent a few hours trying to get a speech working for the happy new year countdown using string manipulation and hard-coded audios; feeling angry I didn't notice this answer earlier. – Dec 31 '20 at 11:05

score 0 · Answer 3 · answered May 28 '22 at 18:13

I made a function that makes life easier. You only have to execute the function with a languagecode, for example speak('hello world','en') for English, see other codes

function speak(text, language) {
    const synth = window.speechSynthesis;
    const utterance = new SpeechSynthesisUtterance(text);
    utterance.voice = synth.getVoices().find(voice => voice.lang.split('-')[0].toLowerCase() === language.split('-')[0].toLowerCase());
    synth.speak(utterance);
}

Check the Web_Speech_API documentation

How can I make my web browser speak programmatically?

3 Answers3