How can I generate timed-text (e.g. for subtitles) synchronised with Text-to-Speech (TTS) word-by-word?
I'd like to do this using the high quality SAPI5 voices (e.g. those available from IVONA here) and that I have used on Windows 10.
On Windows we already have some good free TTS programs:
- Read4Me - open source
- Balabolka - closed source
- TTSApp Microsoft's own very basic GUI - currently available here - it seems to date from 2001.
TTSApp can produce audio files in WAV. Balabolka creates MP3 files
along with synchronised timed-text as LRC files used in Karaoke - BUT only on line-by-line basis NOT word-by-word.
However, both show word-by-word highlighting while they speak aloud on screen - in real time.
If I had some TTS/SAPI5 source code I could simply check the clock every time a new word starts to be generated and write the time and that word to a file. Does anyone know of any project that exposes that level of programming - so I might start from there?
UPDATE SEPT 2016
I've since discovered the TTSApp was reimplemented using AutoHotKey by a certain jballi in 2012.
I've adapted that code to append to a text file the time in ms every time the onWord event handler fires. Still I need to make two passes:
- a rapid automated pass to save the WAV file and
- a slow (realtime) pass that creates the timing file.
I am still hoping to find a way to accelerate step 2.
BTW The VisualBasic source appears to be archived here.