Text to speech Google Cloud Python

Question

I would like to calculate the time duration for sentences when I convert text to speech in Google Cloud in Python. For example, if I have three sentences converted to audio, I would like to know when the first sentence starts in the audio, the second one, etc.

Example:

text= 'Hello, World. I can speak any language. I would like to help you.'

Hello, World: starts 00:00 ends 00:03

I can speak any language: starts 00:04 ends 00:09

I would like to help you: starts 00:10 ends 00:13

Is there something for that in python? here is the main code:

"""Synthesizes speech from the input string of text or ssml.

Note: ssml must be well-formed according to:
    https://www.w3.org/TR/speech-synthesis/
"""
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World. I can speak any language. I would like to help you.")

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code="en-US", ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL
)

texttospeech_v1beta1.types.cloud_tts_pb2

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input_=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("./output.mp3", "wb") as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

you ask when each sentence starts. But you don't say you're playing the files. do you just want the duration and then add durations to get start and end times? Why not break them into the actual separate sentences? — dbmitch, Mar 23 '22 at 21:56
@dbmitch And if I do what you mentioned, would I be able to see the duration for those separate videos? anything that can give me duration or start/end time is much appreciated :) — Alex, Mar 26 '22 at 06:37
Are you playing them with sox or aplay? Or just recording them to file — dbmitch, Mar 26 '22 at 21:27
https://stackoverflow.com/questions/6037826/finding-the-length-of-an-mp3-file — dbmitch, Mar 28 '22 at 17:39

Text to speech Google Cloud Python

0 Answers0