How to Prepare a Dataset with Multilingual Text

Question

I am preparing a dataset for text classification in Jupyter Notebook.

However, one of the column have text sentences which contains words in both Indonesian and English language. Example: 'ETUDE READY NO. 4 DAN 5\n\nTulis di keterangan'

Anybody can advise how I should pre-process this text column?

For simple text classification you probably don't need to do anything about it. Fyi this is called [code switching](https://en.wikipedia.org/wiki/Code-switching). I wouldn't try to translate it if I were you, unless you have a specific reason to. — Erwan, Jan 10 '22 at 20:54

score 0 · Answer 1 · answered Jan 10 '22 at 14:25

There n number of ways you can translates data frames. As a developer I recommend to do some basic internet research. Here I'm dropping some links..

Link1: https://pretagteam.com/question/translate-dataframe-python-to-english-and-save-the-result-into-a-cvs-file

Link2: How to translate other languages to English in pandas dataframe

Link3: Python pandas: Create a new column with values in English by converting values stored in a different column in Chinese traditional

How to Prepare a Dataset with Multilingual Text

1 Answers1