2

I want to synthesize text to speech using GCP Text-to-Speech APIs, almost every example I can find writes a new file, I would like to do this while the function is fed text and have it read over the computers speaker. I have been just trying to convert the GCP uploaded code that says hello world. I have not been able to find a way to read it right after it is converted. It seems Watson and Azure have this service but GCP does not?

client = texttospeech.TextToSpeechClient(credentials=credentials)


synthesis_input = texttospeech.types.SynthesisInput(text=string)


voice = texttospeech.types.VoiceSelectionParams(
    language_code='en-US',
    ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)


audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3)


response = client.synthesize_speech(synthesis_input, voice, audio_config)

with open('output.mp3', 'wb') as out:
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

Any help would be greatly appreciated, I am guessing I am missing some documentation or a simple configuration.

Jose V
  • 1,356
  • 1
  • 4
  • 12
dmc94
  • 536
  • 1
  • 5
  • 16

1 Answers1

2

The GCP Text To Speech APIs returns a response which contains the audio data. What you do with that data on return is up to you. In the example above, the data is written to a file. Should you wish, you could presumably pipe that data to an audio player to play it immediately without involving a file. The choices for data format are either WAV, MP3 or OGG ... see https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize#AudioEncoding.

As for an API to play audio data ... Play audio with Python

Kolban
  • 13,794
  • 3
  • 38
  • 60