2

How do you download the audio from html5 or however this audio of text-to-speech is working on this site? https://ttsreader.com/

I'm trying to automate some testing with real audio to test on Amazon Alexa, and so I need a huge sample set of audio files. So I have all the permutations of the phrases I want to try, but I need different voices for each one.

I found https://ttsreader.com/ and I like the variety and natural voices they have, but I can't figure out how to programmatically download the text-to-speech audio when the voice plays.

I'm planning on downloading like 6k audio files between all the different voices so I definitely need to script this somehow, as their suggested way through Audacity would be far too time consuming.

guest271314
  • 1
  • 15
  • 104
  • 177
joshbenner851
  • 111
  • 1
  • 12
  • 1
    Is requirement to download audio output of text to speech? – guest271314 Jul 20 '17 at 05:20
  • Yes, Updated question to reflect that – joshbenner851 Jul 20 '17 at 05:27
  • These voices are yours (or at least the ones of your system + maybe a few of your browser's) So maybe you might consider using an other tool than the browser to do this job ? I guess there are some softs out there that can also use these speechvoices, and maybe even at a rate > x1 or multiple voices at the same time. – Kaiido Jul 20 '17 at 06:20
  • @Kaiido Voice can be installed or created for `window.speechSynthesis` to list and use. OP is trying to return a static file of, or for lack of a viable alternative, record the output of text-to-speech at computer using JavaScript. Yes "other tool" could achieve the same, the present inquiry is related to using JavaScript. – guest271314 Jul 20 '17 at 06:23
  • @guest271314 but this website didn't installed other voices. `window.speechVoices` (used to make the list) is exactly the same as `speechSynthesis.getVoices()` – Kaiido Jul 20 '17 at 06:26

2 Answers2

3

Soooo this is specific to if you have a Mac and you're happy with the voices Apple provides, but I was enlightened to the command say which allows you to download audio files in different voices.

Just run man say to see all your options for exporting/etc, and say -v ? to see all the voices.

This guide tells you how to download more voices

Break out a quick bash script and you're all set to go

# A = item you want Alexa to be changing,   B = Voices available
A=(Potatoes Steak Carrots) B=(Fiona Serena Daniel)
nameLength=${#A[@]}
voiceLength=${#B[@]}

for((i=0;i<$nameLength;i++)); do 
   for((x=0;x<$voiceLength;x++)); do 
      say "Alexa, ask spartycafe to log ${A[$i]}" -v ${B[$x]} -o ${A[$i]}$B$x.m4a; 
   done ;
done
joshbenner851
  • 111
  • 1
  • 12
0

You can use navigator.mediaDevices.getUserMedia() with settings object {audio:true}, MediaRecorder(). At navigator.mediaDevices() permissions prompt select Monitor of Built-in Audio Analog Stereo to record to record MediaStream of audio output of the output to speakers or headphones.

You can alternatively install or create voices at local filesystem and utilize window.speechSynthesis.speak() and SpeechUtterance object with the above approach to record audio output locally.

Or use the approach to record audio output as a visitor at a website.

See also

guest271314
  • 1
  • 15
  • 104
  • 177
  • I think that only you (and a few others) will have `Monitor of Built-in Audio Analog Stereo` as a device. This sounds like a virtual device, maybe from your OS, I don't have it myself, but anyway, not suitable for public facing web page. Also, even for personal use, this method will take as long as using an external software like Audacity as suggested in question. – Kaiido Jul 20 '17 at 06:07
  • @Kaiido Yes, though presently what are the alternatives for recording native text-to-speech audio output, using JavaScript without a library, or adjusting browser source code? – guest271314 Jul 20 '17 at 06:10
  • @Kaiido fwiw have recently been attempting to compose or locate a solution for the current inquiry [How to implement option to return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak() call](https://softwareengineering.stackexchange.com/q/352073/), without, as of yet, creating a viable cross-browser solution. We can install or create our own voice objects locally or create our own API for the same. Returning a file of the audio rather than output to speakers does not appear to be currently implemented at browsers. Do you propose alternative solutions? – guest271314 Jul 20 '17 at 06:20
  • No, I don't think there is currently an other solution than an VirtualDevice to route audio to a recorder, but for OP's case, I didn't found any voice that is not from my system in the linked web site, so I guess he is not forced to use a web-browser. – Kaiido Jul 20 '17 at 06:25
  • @Kaiido Have not viewed code at linked website. There are voice alternatives that can be installed which should be available at `window.speechSynthesis.getVoices()`. The gist of present Answer is while we do not have the ability to record the output of `window.speechSynthesis.speak()` now, there is no reason why we should not have that ability, to avoid cumbersome workarounds. In the meantime, OP can do the work of locating the voices that they want to use locally, for example http://espeak.sourceforge.net/mbrola.html, https://askubuntu.com/questions/554747/how-to-install-more-voices-to-espeak – guest271314 Jul 20 '17 at 06:35
  • 1
    The answer that just came out is exactly my point though. OP wants to get 6k files recorded, with your workaround, he would still have to play one file at a time + write the scripts + install an VirtualDevice... Other softs are more able than a web browser for this task. And I agree that speechSynthesis API could probably benefit from such an option, but I'm not sure how this API is actually doing in term of development. So yes the answer should may be include an *"It's not possible through web API yet'* but it might solve the problem. (Ps: even your answer lacks this note) – Kaiido Jul 20 '17 at 06:40