3

I'd like to create animated heads in my web apps. It seems that CSS3 transition, animation and background features with a little help of javascript web API is all I need. Using xface looks like an overkill to me, cartoon solutions is almost all I need. I need to make it cartoon.

I've made some progress already (beeing able to create voice controlled web app), but this time I need mp3/wav input, not direct voice from microphone using google servers through x-webkit-speech.

I am considering this approach:

  1. record speech into mp3 or wav and write it's string contents
  2. play the mp3 in browser and detect end of words using AnalyserNode to synchronize position in the string (I use Czech language which, unlike the English, has almost constant speech speed).
  3. display the cartoon heads (see the link above) according to actual spoken letter

The question: Is there any lower effort (shorter development time for coder and designer) approach? Especially step 2 and English language in the future makes me worried. Maybe some karaoke tool could produce some speech sync file (which can I parse into CSS3 keyframes)? I am not aware of any such tool.

Community
  • 1
  • 1
Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
  • 1
    I listed some resources in [Make a realtime realistic 3D avatar with text-to-speech, Viseme Lip-sync, and emotions/gestures](https://stackoverflow.com/questions/73806104) – trusktr Jul 10 '23 at 21:38

2 Answers2

1

For something more involved you might try:

Step 1. Web speech API to text to voice...

http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API

Step 2 try porting "papagayo" to js (uses dictionary to relate words to phonemes to mouth poses I believe)

http://anime.smithmicro.com/papagayo.html

The GNU source is available here: http://anime.smithmicro.com/update_files/papagayo/papagayo_1.2_source.zip

You might also refer to: http://www.adobe.com/devnet/flash/articles/lip-sync-smartmouth.html for an overview of what you're trying to achieve

acheo
  • 3,106
  • 2
  • 32
  • 57
  • Very interesting links, thanks. It will take me a while to read through this, since I'll be quite busy for next few months. – Jan Turoň Oct 23 '13 at 21:23
0

Maybe you could do something really quick and dirty with spectrum analysis: http://0xfe.muthanna.com/wavebox/

acheo
  • 3,106
  • 2
  • 32
  • 57
  • 1
    That's just what I tried, but I was only able to detect 'c' and end of the words, I couldn't find any other pattern in speech, google analyzer must be farm more sophisticated. I guess I'll wait for some tool... – Jan Turoň Oct 22 '13 at 13:01
  • Is a continuously moving mouth animation during periods of volume threshold too tacky for your purposes? – acheo Oct 23 '13 at 14:42
  • I'm now getting "This site can’t be reached" for above link. – Philipp Lenssen Sep 06 '22 at 10:10