0

I have a dataset

        English Test|

0   Biggest

1   Compare 3 digit numbers

2   Compare fractions

3   Counting numbers up to 10

4   Division

5   Even or odd

6   Identify 2-dimensional shapes

7   Mixed Operations

I want to translate these test to Malaysian, Spanish, Russian Language so I am using 'googletrans' I tries two method Got this error

import googletrans
from googletrans import Translator
import pandas as pd
translator = Translator()
df['Malaysian Text'] = df['English Text'].apply(translator.translate(lang_src='en', lang_tgt='ms')).apply(getattr, args=('text'))

Error: translate() missing 1 required positional argument: 'text'

df['Malaysian Text'] = translator.translate(df['English Text'], lang_src='en', lang_tgt='ms')

Error: 'NoneType' object has no attribute 'group'

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57

1 Answers1

0

There is a known issue with googletrans. Modify the version to a working version see: googletrans stopped working with error 'NoneType' object has no attribute 'group'.

Upgrading with

pip install googletrans==4.0.0-rc1

should resolve the googletrans specific issues.


Then from a pandas perspective, we need to pass the actual column values to translate:

def translate(text, lang_src, lang_tgt):
    return translator.translate(text, src=lang_src, dest=lang_tgt).text


df['Malaysian Text'] = df['English Text'].apply(
    translate, lang_src='en', lang_tgt='ms'
)

*Note 1 this involves individual text lookups for several translations which is a slow process. Processing this text row by row may not be the most efficient in many use cases.

*Note 2 the translate takes kwargs src and dest not lang_src and lang_tgt. We can define our own function that uses these terms, however.

df:

                    English Text                Malaysian Text
0                        Biggest             Yang paling besar
1        Compare 3 digit numbers     Bandingkan nombor 3 digit
2              Compare fractions             Bandingkan Fraksi
3      Counting numbers up to 10    Mengira nombor sehingga 10
4                       Division                      Bahagian
5                    Even or odd             Genap atau ganjil
6  Identify 2-dimensional shapes  Kenal pasti bentuk 2-dimensi
7               Mixed Operations             Operasi bercampur

If we'd like to do several languages, we can do this in a loop:

def translate(text, lang_src, lang_tgt):
    return translator.translate(text, src=lang_src, dest=lang_tgt).text


for label, tgt in [('Malaysian', 'ms'),
                   ('Russian', 'ru'),
                   ('Spanish', 'es')]:
    df[f'{label} Text'] = df['English Text'].apply(
        translate, lang_src='en', lang_tgt=tgt
    )

df

                    English Text                Malaysian Text               Russian Text                         Spanish Text
0                        Biggest             Yang paling besar              Самый большой                           Más grande
1        Compare 3 digit numbers     Bandingkan nombor 3 digit           Сравните 3 цифры        Comparar números de 3 dígitos
2              Compare fractions             Bandingkan Fraksi           Сравнить фракции                  Comparar fracciones
3      Counting numbers up to 10    Mengira nombor sehingga 10      Подсчет номеров до 10            Contando números hasta 10
4                       Division                      Bahagian                 Разделение                             División
5                    Even or odd             Genap atau ganjil        Четным или нечетным                          Par o impar
6  Identify 2-dimensional shapes  Kenal pasti bentuk 2-dimensi  Определить 2-мерные формы  Identificar formas 2-dimensionales.
7               Mixed Operations             Operasi bercampur         Смешанные операции                   Operaciones mixtas

Generally, however, it can be much quicker to create a single text blob from the column to translate once, then break it back apart:

# Collapse Column into string with new lines
text = '\n'.join(df['English Text'].fillna(''))
for label, tgt in [('Malaysian', 'ms'),
                   ('Russian', 'ru'),
                   ('Spanish', 'es')]:
    df[f'{label} Text'] = translator.translate(
        text, src='en', dest=tgt
    ).text.split('\n')  # separate translation by new lines

*produces the same output as above.


Setup and imports used:

import pandas as pd
from googletrans import Translator

translator = Translator()

df = pd.DataFrame({
    'English Text': ['Biggest', 'Compare 3 digit numbers', 'Compare fractions',
                     'Counting numbers up to 10', 'Division', 'Even or odd',
                     'Identify 2-dimensional shapes', 'Mixed Operations']
})
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57