3

I have a column "City_trad_chinese" in a pandas dataframe "df" which contains values in Traditional Chinese language. I need to create another column "City_English" which must contain the translated values in English.

How can I do this with Python? I tried the following:

#importing required libraries
import pandas as pd 

from os import path

from googletrans import Translator

#setting path to data
path2data = 'C:/Users/data'

# data import
df = pd.read_excel(path.join(path2data, 'data.xlsx'), converters={'City_trad_chinese':str})


translator = Translator()

df['City_English'] = df['City_trad_chinese'].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)

but it is giving me an error:

raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value
Archit gupta
  • 147
  • 1
  • 2
  • 11
  • The error appears to be due to a limit on the amount of characters you can translate at one time using the google translate api. If you go over this limit (15k), google just responds with an empty json. [This question](https://stackoverflow.com/questions/48021371/jsondecodeerror-using-google-translate-api-with-python3) claims that if it still doesn't work, reducing it to 5k character chunks resolves the issue. – iacob Jun 11 '18 at 13:01

1 Answers1

4

You can use the library googletrans

import pandas as pd
from googletrans import Translator

d = {"City_trad_chinese":["香港特别行政区",
                          "澳门特别行政区",
                          "北京市",
                          "上海市"]}
df = pd.DataFrame(data=d)

translator = Translator()

df["City_English"] = df["City_trad_chinese"].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)

print(df["City_English"])

0    Hong Kong Special Administrative Region
1        Macao Special Administrative Region
2                               Beijing City
3                              Shanghai City

Note: The Google Translate API has a 15k character limit. You can circumnavigate this by translating each row individually:

df["City_English"] = ""

for index, row in df.iterrows():
    translator = Translator()
    eng_text = translator.translate(row["City_trad_chinese"], src="zh-TW", dest="en").text
    row["City_English"] = eng_text
iacob
  • 20,084
  • 6
  • 92
  • 119
  • Is there any way I can do this translation for whole column with a single command ? – Archit gupta Jun 11 '18 at 11:20
  • 2
    @Architgupta - use `df['eng'] = df['chinese'].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)` – jezrael Jun 11 '18 at 11:31
  • it is throwing the error: raise JSONDecodeError("Expecting value", s, err.value) from None – Archit gupta Jun 11 '18 at 11:58
  • @Architgupta [this page](https://stackoverflow.com/a/35885331/9067615) might help solve that issue `"The error arises because the "data" is of type bytes so you have to decode it into a string before using json.loads to turn it into a json object."` – iacob Jun 11 '18 at 12:04
  • I have imported the data from an excel file, and too have converted the particular column into string while importing. – Archit gupta Jun 11 '18 at 12:29
  • Df = pd.read_excel(path.join(path2data, 'data.xlsx'), converters={'City_trad_chinese':str}) – Archit gupta Jun 11 '18 at 12:30
  • @Architgupta can you add your code to your question? Will need to see it in context, up until the part that uses the json library. – iacob Jun 11 '18 at 12:35
  • What can I do if there is none type? – Long Le Sep 02 '18 at 11:04