1

for i in range(10300): sentence = df["tweet"][i] translations = translator.translate(sentence.encode('unicode-escape').decode('ASCII'), dest='en')

The error i am getting while executing

Sonam
  • 21
  • 2
  • Something in your code or in the imported module(s) is attempting to dereference None. Can you also clarify what 'translator' is? I tried this with the standard translate module and constructed translate as Translator(to_lang='en') but its translate method doesn't expect dest –  Jul 26 '21 at 12:59
  • import googletrans from googletrans import Translator translator = Translator() for i in range(10300): sentence = df["tweet"][i] translations = translator.translate(sentence.encode('unicode-escape').decode('ASCII'), dest='en') – Sonam Jul 26 '21 at 15:10
  • This is the whole piece of code i am using – Sonam Jul 26 '21 at 15:12
  • This appears to be a known issue with googletrans. See:- https://stackoverflow.com/questions/52446811/why-googletrans-translator-suddenly-stopped-working/53577732#53577732 –  Jul 26 '21 at 15:30
  • Actually i am getting error 'NoneType' object has no attribute 'group' – Sonam Jul 26 '21 at 15:48
  • That's not the point. googletrans is obviously broken. I tried it with a much simpler use-case and it failed similarly –  Jul 26 '21 at 15:59
  • https://github.com/ssut/py-googletrans/issues/234 –  Jul 26 '21 at 16:11

1 Answers1

0

Here is a solution based on keras-transformers: Ref: PyPI

The env.yml file for creating a conda environment is this: env.yml on GitHub

The manually labeled dataset for Hinglish to English translation is available here:

Dataset on GitHub

And, the Jupyter Notebook with code is here: Jupyter Notebook on GitHub

Here is a blog post for performance report of the same code tested on my laptop: Hinglish to English Machine Translation Using Transformers

Ashish Jain
  • 447
  • 1
  • 6
  • 20
  • I looked at the dataset. I see top 55 sentences have hindlish form of words, rest missing. By any chance do you have hindlish form for rest of the sentences as well? – Senthilkumar M Oct 15 '22 at 07:56
  • 1
    As of today, there are now 195 labeled data points. Shall let here know when I have more labeled data points. – Ashish Jain Oct 17 '22 at 08:51