My text to speech voice results never sound as good as on the IBM Demo page (2)

Question

When I submit a text to speech conversion using CURL, I get an OK sounding audio file, but a bit robotic and nasal. But this demo page sounds terrific and I can never get such high quality results. I do not specify the voice to use, so it uses some default.

https://www.ibm.com/demos/live/tts-demo/self-service/home

What is the above page doing differently than me?

My curl command is this:

$ curl -u "apikey:api-removed" --header "Content-Type: application/json" --header "Accept: audio/ogg" -d "@Greeting_Script.txt" --output greeting.ogg --dump-header "logfile.txt" "url-removed"

Redgar Tech replied "If you had seen on the demo page, you were using a neural enhanced DNN version of the voices. Here, you are using their regular voice with no perfection and training."

However this link

https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices

says

"If you omit the optional voice parameter from a synthesis request, the service uses en-US_MichaelV3Voice by default"

I omitted the optional voice parameter from my synthesis request (see above) and yet I did NOT get results that used the neural enhanced voice of en-US_MichaelV3Voice.

So I tried adding the voice parameter for en-US_MichaelV3Voice and now the result is the clear neural enhanced version, same as the demo page provides.

So that means the documentation that states omitting the optional voice parameter defaults to en-US_MichaelV3Voice is incorrect. I think it may default to en-US_MichaelVoice, which is not the neural enhanced version.

score 1 · Answer 1 · answered Apr 22 '21 at 22:20

I have confirmed that if I omit the optional voice parameter from a synthesis request, the service uses en-US_MichaelVoice by default. The evidence is in the log file:

session-name: EIHRWWSDMRCEZXKA-en-US_MichaelVoice

This means the information at this link

https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices

that states "If you omit the optional voice parameter from a synthesis request, the service uses en-US_MichaelV3Voice by default." is incorrect.

When I did add the voice parameter for en-US_MichaelV3Voice, the log file contained this line:

session-name: FIPYVOXYBMNRSQZQ-en-US_MichaelV3Voice

My text to speech voice results never sound as good as on the IBM Demo page (2)

1 Answers1