9

Is there a JavaScript library or product that exists that provides text-to-speech for animated, speaking avatars, that does not use flash or any other plug-in. The idea is that I type in text and the avatars mouth moves as audio is played.

The aim is a cross-browser, cross device, no-plugins, web-based talking chat avatar.

I looked at CrazyTalk, which seemed perfect, but sadly it turns out that that relies on the unity engine.

I then started to think about rolling my own by combining existing text to speech services and trying to pull phonemes out of an audio wave and make my own dictionary of phonemes to canvas shapes. That doesn't really seem to exist either (and even if it did, I'm not sure how I would work the timing on mouth movement to audio).

Its 2015, I feel like something like this should already exist and I shouldn't be trying to invent it.

Edit: Now I'm looking into Microsft.Speech. I really need something that spits out something like IPA in syllables and I'm not sure if MS.Speech does that. TTS wave creation is the easy part. I could send text to the server, match phonetic syllables to mouth point coordinates... if I could just get those syllables broken out. What breaks text into phonetic syllables.

user2245759
  • 477
  • 6
  • 17
  • Since you couldn't find this app...Now is your time to shine! Build this app and post it as a Github repository. I'd support and make use of it. Good luck with your project. – markE Mar 05 '15 at 18:51
  • I listed some resources in [Make a realtime realistic 3D avatar with text-to-speech, Viseme Lip-sync, and emotions/gestures](https://stackoverflow.com/questions/73806104) – trusktr Jul 10 '23 at 21:39

3 Answers3

4

You want to look at the Speech Synthesis API. The most basic use is:

var msg = new SpeechSynthesisUtterance('Hello World');
window.speechSynthesis.speak(msg);

http://updates.html5rocks.com/2014/01/Web-apps-that-talk---Introduction-to-the-Speech-Synthesis-API

https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#tts-section

Here is browser support: http://caniuse.com/web-speech. At the moment only Chrome & Safari support it.

rickyduck
  • 4,030
  • 14
  • 58
  • 93
  • Also, consider that many people have tried to invent it before you. Unfortunately cross browser is never going to be an option with such niche, modern technologies. Hell, we can't even do Web Components yet without plethoras of polyfills, and that's the most eagerly anticipated spec. You wouldn't be able to use the Web Audio API to manipulate the audio yourself, either, as it isn't supported in IE11. – rickyduck Mar 05 '15 at 18:01
  • Currently, the Speech Synthesis API does not give any phonemes that would aid you in lip syncing. It works solely on strings, and you can only receive an event when a word has been mentioned, not even when syllables have been pronounced. – Design by Adrian Jul 09 '18 at 08:17
1

I think I have an approach. In short, no, there does not appear to be an existing utility... Yet ;-)

I've decide to go with the Microsoft Speech Platform. It does better than return phonemes, it provides the accompanying viseme IDs with the audio position at which they occur. So I can generate a wav file and a viseme meta-data list server-side and retrieve them. Now to figure out how to synchronize them.

user2245759
  • 477
  • 6
  • 17
  • This works great. There was a little extra work getting the viseme sync'd. For whatever reason the AudioPosition that the speech platform assigns is off, way off, from what I am thinking it should be. I recreated the AudioPosition from a running sum of the duration of each preceding viseme. Now to replace all those viseme images with lots of webgl work. – user2245759 Apr 02 '15 at 01:08
  • 1
    can you share a working example of this if you got it working? – Pixelomo Oct 31 '17 at 03:10
0

I am facing a similar problem.

First, have you looked at www.haptek.com? It is exactly what you want... but it seems to be dead and only works on xp...

Second, it is possible to use the Microsoft speech api directly from script in the browser... but the chrome tts is a better option I think.

Micho
  • 3,929
  • 13
  • 37
  • 40