Different ssml values generate the same audio in Google Text to Speech

Question

Unable to generate different audio wave when using ssml when using WaveNet voices.

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
<prosody rate="medium" pitch="1st">Can you hear me now?</prosody>
<prosody rate="high" pitch="5st">Can you hear me now?</prosody>

Using the emphasis tag produces the same results.

We are using the Python API from Google Cloud Text-to-Speech to request audio generation.

I would like to hear different voice intensities in each sample.

Please note, we also try scaping the ", but it makes no diference in the generated audios.

https://issuetracker.google.com/issues/131618213

It is easier to help you if you include all the relevant code, not 'just' what you think may be failing. — Rub, Dec 26 '21 at 21:19
Hi Rub, Thanks for the interest. We were using the TTS UI to test it, with the prosody texts we provided back in 2019. Our python code produced the same results as the TTS UI. — Jose GR, Dec 28 '21 at 10:38
Unfortunately management decided that they don´t like how the voice sound, so we solve the problem recoding a human. Since them the issues may be resolved, as we where not allowed to research this any longer, we archive the project. — Jose GR, Dec 28 '21 at 10:47

score 0 · Answer 1 · answered Aug 24 '20 at 19:10

I don't know what that looks like with the Python sdk, but I'm currently using their NodeJs sdk for TTS.

It seems that, these prosody properties (rate, volume, pitch), instead of setting and passing through your ssml text, should be configured directly in the request object which will be sent to Google TTS api.

score 0 · Answer 2 · answered Dec 26 '21 at 21:24

Using the TTS UI you can test different configurations easily.

And exporting to JSON you can also see how the API call would need to be.

In this case:

Request URL
https://texttospeech.googleapis.com/v1beta1/text:synthesize
Request body
{
  "audioConfig": {
    "audioEncoding": "LINEAR16",
    "pitch": 0,
    "speakingRate": 1
  },
  "input": {
    "ssml": "<speak><prosody rate='70%'> The slings and arrows of outrageous fortune. Or to take arms against a sea of troubles And by opposing end them.</prosody> </speak>"
  },
  "voice": {
    "languageCode": "en-US",
    "name": "en-US-Wavenet-G"
  }
}

Without seeing the full code that you use on the API call it is difficult to see what may be failing for you.

Different ssml values generate the same audio in Google Text to Speech

2 Answers2