3

I want to convert text to speech from a document where multiple languages are included. When I am trying to do the following code, I fetch problems to record each language clearly. How can I save such type mixer text-audio clearly?

from gtts import gTTS
mytext = 'Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰهِ'
language = 'ar' # arabic
myobj = gTTS(text=mytext, tld='co.in', lang=language, slow=False)
myobj.save("audio.mp3")
Md. Rezwanul Haque
  • 2,882
  • 7
  • 28
  • 45

1 Answers1

4

It's not enough to use just text to speech, since it can work with one language only.
To solve this problem we need to detect language for each part of the sentence.
Then run it through text to speech and append it to our final spoken sentence.
It would be ideal to use some neural network (there are plenty) to do this categorization for You.
Just for a sake of proof of concept I used googletrans to detect language for each part of the sentences and gtts to make a mp3 file from it.

It's not bullet proof, especially with arabic text. googletrans somehow detect different language code, which is not recognized by gtts. For that reason we have to use code_table to pick proper language code that works with gtts.

Here is working example:

from googletrans import Translator
from gtts import gTTS

input_text = "Welcome to gtts! আজ একটি ভাল দিন। tumi kemon acho? ٱلْحَمْدُ لِلَّٰه"
words = input_text.split(" ")
translator = Translator()
language, sentence = None, ""

lang_code_table = {"sd": "ar"}

with open('output.mp3', 'wb') as ff:
    for word in words:
        if word == " ":
            continue
        # Detect language of current word
        word_language = translator.detect(word).lang

        if word_language == language:
            # Same language, append word to the sentence
            sentence += " " + word
        else:
            if language is None:
                # No language set yet, initialize and continue
                language, sentence = word_language, word
                continue

            if word.endswith(("?", ".", "!")):
                # If word endswith one of the punctuation marks, it should be part of previous sentence
                sentence += " " + word
                continue

            # We have whole previous sentence, translate it into speech and append to mp3 file
            gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)

            # Continue with other language
            language, sentence = word_language, word

    if language and sentence:
        # Append last detected sentence
        gTTS(text=sentence, lang=lang_code_table.get(language, language), slow=False).write_to_fp(ff)

It's obviously not fast and won't fit for longer text.
Also it needs better tokenizer and proper error handling.
Again, it's just proof of concept.

Domarm
  • 2,360
  • 1
  • 5
  • 17
  • Sir, I am concerned with the Arabic language. Is there any way to do work on mixer language, must be including Arabic? – Md. Rezwanul Haque Jan 28 '22 at 06:34
  • 1
    So easiest workaround for arabic is to use language code table. Google translator was giving "sd" code and that was not working with gtts. In updated code, arabic is working. I removed part which was increasing frequency of words in the sentence. It was causing issues with Arabic again. But for Your final implementation, You should play around a bit and figure out proper transition between one language to another. – Domarm Jan 28 '22 at 07:36
  • 1
    I've added one small optimization, which fixes "acho?" in Your example. If word ends with one of punctuation mark, it should be part of the previous sentence, even if it was detected as a different language. – Domarm Jan 29 '22 at 07:07
  • 1
    @Domarm i want to use your code for reading a large book,both googletrans and gtts has the habhit of getting disconnected frequently.i tried time.sleep(10) but still no luck with gtts,,this is the only reason why i didn't follow this naive approach for multi lingual single line input,,any idea on how we can make this code running for long time and read a full book? i don't care if it takes 2-3 days to read whole book without throwing error like gTTSError: 429 (Too Many Requests) from TTS API. – Mobassir Hossen Feb 03 '22 at 06:52
  • 1
    This is behavior of every google API. I looked a bit around, but there is not exact timeout or limit per day per user (I might be wrong). What You can try is to add random sleep, not constant one. But if limit is bound to one daily, it won't help. You can modify script to save what it has into mp3 until error. Then save position of last successfully translated sentence and continue next day from that position and so on until You have whole text translated. One can also use proxy to change identity and overcome limits, but this is fair use community, so I don't want to suggest any "tricks". – Domarm Feb 03 '22 at 13:24