There is a known issue with googletrans. Modify the version to a working version see: googletrans stopped working with error 'NoneType' object has no attribute 'group'.
Upgrading with
pip install googletrans==4.0.0-rc1
should resolve the googletrans specific issues.
Then from a pandas
perspective, we need to pass the actual column values to translate:
def translate(text, lang_src, lang_tgt):
return translator.translate(text, src=lang_src, dest=lang_tgt).text
df['Malaysian Text'] = df['English Text'].apply(
translate, lang_src='en', lang_tgt='ms'
)
*Note 1 this involves individual text lookups for several translations which is a slow process. Processing this text row by row may not be the most efficient in many use cases.
*Note 2 the translate
takes kwargs src
and dest
not lang_src
and lang_tgt
. We can define our own function that uses these terms, however.
df
:
English Text Malaysian Text
0 Biggest Yang paling besar
1 Compare 3 digit numbers Bandingkan nombor 3 digit
2 Compare fractions Bandingkan Fraksi
3 Counting numbers up to 10 Mengira nombor sehingga 10
4 Division Bahagian
5 Even or odd Genap atau ganjil
6 Identify 2-dimensional shapes Kenal pasti bentuk 2-dimensi
7 Mixed Operations Operasi bercampur
If we'd like to do several languages, we can do this in a loop:
def translate(text, lang_src, lang_tgt):
return translator.translate(text, src=lang_src, dest=lang_tgt).text
for label, tgt in [('Malaysian', 'ms'),
('Russian', 'ru'),
('Spanish', 'es')]:
df[f'{label} Text'] = df['English Text'].apply(
translate, lang_src='en', lang_tgt=tgt
)
df
English Text Malaysian Text Russian Text Spanish Text
0 Biggest Yang paling besar Самый большой Más grande
1 Compare 3 digit numbers Bandingkan nombor 3 digit Сравните 3 цифры Comparar números de 3 dígitos
2 Compare fractions Bandingkan Fraksi Сравнить фракции Comparar fracciones
3 Counting numbers up to 10 Mengira nombor sehingga 10 Подсчет номеров до 10 Contando números hasta 10
4 Division Bahagian Разделение División
5 Even or odd Genap atau ganjil Четным или нечетным Par o impar
6 Identify 2-dimensional shapes Kenal pasti bentuk 2-dimensi Определить 2-мерные формы Identificar formas 2-dimensionales.
7 Mixed Operations Operasi bercampur Смешанные операции Operaciones mixtas
Generally, however, it can be much quicker to create a single text blob from the column to translate once, then break it back apart:
# Collapse Column into string with new lines
text = '\n'.join(df['English Text'].fillna(''))
for label, tgt in [('Malaysian', 'ms'),
('Russian', 'ru'),
('Spanish', 'es')]:
df[f'{label} Text'] = translator.translate(
text, src='en', dest=tgt
).text.split('\n') # separate translation by new lines
*produces the same output as above.
Setup and imports used:
import pandas as pd
from googletrans import Translator
translator = Translator()
df = pd.DataFrame({
'English Text': ['Biggest', 'Compare 3 digit numbers', 'Compare fractions',
'Counting numbers up to 10', 'Division', 'Even or odd',
'Identify 2-dimensional shapes', 'Mixed Operations']
})