I am trying to translate words from a Pandas dataframe column of ca. 200000 rows in length. It looks like this:
df =| review | rating |
| love it | 5 |
| hate it | 1 |
| its ok | 3 |
| great | 4 |
I am attempting to translate this into a different language using googletrans, and I have seen some solutions using df.apply
to apply the function to each row, however it is painfully slow in my case (roughly 16 hours needed to translate the whole column).
However googletrans do support batch translations where it takes a list of strings as an argument instead of just a single string.
I have been looking for a solution which takes advantage of this and my code looks like this:
from googletrans import Translator
translator = Translator()
list1 = df.review.tolist()
translated = []
for i in range(0,len(df),50)):
translated.extend([x.text for x in translator.translate(list1[i:i+50], src='en' , dest='id')])
df['translated_review'] = translated #add back to df
But it is still as slow. Could anyone shed some light on how to further optimise this?