0

I have a JSON file named region_descriptions.json available in this link. This file is not loading properly in notepad++ in my windows(Since it is a huge file). The file is partially loading in google chrome. This file is a dataset for my dense-captioning task and I need to write a python script to translate every "phrase" in it to hindi for my purpose.

I navigated in power shell to the directory where my json file is there and then setup the environment variable using: >>$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\Preeti\Downloads\Compressed\region_descriptions.json"

After this I tried to open jupyter notebook in the same directory and run the code:

import ijson
from google.cloud import translate

translate_client = translate.Client()

parser = ijson.items(open("region_descriptions.json"), "item.regions.item")

maxTranslations = 100;
for region in parser:
    translation = translate_client.translate(region["phrase"], target_language="hi")

    print(region["phrase"])
    print(translation['translatedText'])

    maxTranslations-=1
    if maxTranslations==0:
        break

but the jupyter notebook is giving me an error that:

AttributeError                            Traceback (most recent call last)
<ipython-input-1-5fa13c6f3710> in <module>
      2 from google.cloud import translate
      3 
----> 4 translate_client = translate.Client()
      5 
      6 parser = ijson.items(open("region_descriptions.json"), "item.regions.item")

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\cloud\translate_v2\client.py in __init__(self, target_language, credentials, _http, client_info)
     75     ):
     76         self.target_language = target_language
---> 77         super(Client, self).__init__(credentials=credentials, _http=_http)
     78         self._connection = Connection(self, client_info=client_info)
     79 

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\cloud\client.py in __init__(self, credentials, _http)
    128             raise ValueError(_GOOGLE_AUTH_CREDENTIALS_HELP)
    129         if credentials is None and _http is None:
--> 130             credentials, _ = google.auth.default()
    131         self._credentials = google.auth.credentials.with_scopes_if_required(
    132             credentials, self.SCOPE

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in default(scopes, request)
    303 
    304     for checker in checkers:
--> 305         credentials, project_id = checker()
    306         if credentials is not None:
    307             credentials = with_scopes_if_required(credentials, scopes)

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in _get_explicit_environ_credentials()
    163     if explicit_file is not None:
    164         credentials, project_id = _load_credentials_from_file(
--> 165             os.environ[environment_vars.CREDENTIALS])
    166 
    167         return credentials, project_id

c:\users\preeti\appdata\local\programs\python\python37\lib\site-packages\google\auth\_default.py in _load_credentials_from_file(filename)
    100     # The type key should indicate that the file is either a service account
    101     # credentials file or an authorized user credentials file.
--> 102     credential_type = info.get('type')
    103 
    104     if credential_type == _AUTHORIZED_USER_TYPE:

AttributeError: 'list' object has no attribute 'get'

Can someone please help me to write a python script to translate all the phrases in the json file to hindi or help me overcome my error? I would strongly recommend to download the json file from the link given for better understanding of what "phrase" I am referring to.

1 Answers1

0

Since your file is large, you should use ijson.

The following code worked for me:

import ijson
from google.cloud import translate

translate_client = translate.Client()

parser = ijson.items(open("region_descriptions.json"), "item.regions.item")

maxTranslations = 100;
for region in parser:
    translation = translate_client.translate(region["phrase"], target_language="hi")

    print(region["phrase"])
    print(translation['translatedText'])

    maxTranslations-=1
    if maxTranslations==0:
        break

You should consider the following points, in the case, the above does not work for you:

  1. Do not forget to setup GOOGLE_APPLICATION_CREDENTIALS environment variable.
  2. Remove break from the for loop, once everything is working fine.
  3. If you are not able to understand how ijson works, you will find this tutorial helpful.
Shubham Agrawal
  • 182
  • 2
  • 12
  • Thank you so much for your well written answer. I tried running this code but I keep getting a memory error(which probably suggests that the code is right). I tried googling about this error and found out that probably my laptop isn't good enough to run this code and perform the task given its hardware and software specifications. Can someone with a superior laptop please help me by giving me a link to the translated file so that I can download that file and use it for my purpose. I really need this **urgently** for my work. – Praveen Iyer Jun 07 '19 at 10:30
  • I strongly believe that your laptop should be good enough. The file size is 600MB only. If your laptop RAM is >=4GB, I think it should work fine. Try running the code using python shell instead of Jupyter. Also try this with Jupyter: `jupyter notebook -- NotebookApp.iopub_data_rate_limit=1.0e10` Can you post the error? – Shubham Agrawal Jun 07 '19 at 17:46
  • Also, don't expect anyone on StackOverflow to do your work. This platform isn't meant to work this way. Show us your error, and we will surely help :). – Shubham Agrawal Jun 07 '19 at 17:48
  • Yes my laptop has more than 4GB of RAM and I hope for the best. I'm sorry if I asked the wrong kind of question. I tried running the code on python shell to get the same memory error. When I tried to run the code o jupyter notebook with the argument as you have mentioned above I am getting this error: MemoryError Traceback (most recent call last) MemoryError: – Praveen Iyer Jun 07 '19 at 19:07
  • I have updated my answer. This should definitely work for you. – Shubham Agrawal Jun 08 '19 at 19:29
  • Hey thanks for following up. I tried the new ijson part but was initially getting some google applications credentials error. But as you said I managed to setup an environment variable for that and managed to overcome that error. But again I am getting the **same memory error** when I ran this code on jupyter notebook as well as normal python script. It doesn't seem as if the ijson changes have made a difference. – Praveen Iyer Jun 10 '19 at 07:48
  • @PraveenIyer, Please post 1. The exact code you ran, 2. using what commands you ran the code and 3. the exact error trace (update them in your question). Also, try removing the print statements. The memory error might appear due to the print statements as well. – Shubham Agrawal Jun 10 '19 at 20:27
  • I have edited the question with all the new details. – Praveen Iyer Jun 11 '19 at 12:25
  • Thanks for adding the details. I notice that you are using a 32-bit version of python where [similar issues](https://www.quora.com/How-can-I-solve-a-memory-error-in-Python) were reported. Uninstall this and try installing the 64-bit version. The traceback point that the memory error occurred while executing the second line, which is very strange. Because at this point, we didn't even load the file. Try running a few simple python programs and see if they are working. Also, try to translate a single dummy test string. It would be much easier if we can sit together and solve this issue. – Shubham Agrawal Jun 11 '19 at 17:44
  • I uninstalled my 32 bit python and installed 64 bit version and tried out some basic programs which worked fine but when I tried running the program in my question it gave a different error which I have updated in my question(note that I did setup the environment variable in power shell properly as mentioned in my question). – Praveen Iyer Jun 12 '19 at 08:27
  • The error points out that there is some issue with the environment variable. I am not sure whether setting the environment variable in power shell will make it accessible in Jupyter Notebook. Best way to test this is `import os` `print(os.environ['GOOGLE_APPLICATION_CREDENTIALS'])`. If it does not print the required path, you can setup the environment variables in the notebook itself ([follow this link](https://stackoverflow.com/a/44251637/5756943)). – Shubham Agrawal Jun 12 '19 at 18:15
  • I don't think there is any problem in setting up the variable in power shell since it did work for me before. Besides, I also tried out the import os and print statement in jupyter notebook and I was getting the desired path without any issue. I also tried setting up the variable in the notebook itself but that also did not work. – Praveen Iyer Jun 13 '19 at 06:40
  • Just noticed, you are setting your GOOGLE_APPLICATION_CREDENTIALS to your region_description.json file, which is wrong. [Follow this link](https://cloud.google.com/translate/docs/quickstart-client-libraries) which will guide you how to get appropriate credentials file. Set the path of that file in GOOGLE_APPLICATION_CREDENTIALS. – Shubham Agrawal Jun 13 '19 at 08:21
  • Good that you found out that. Now I see some scope for progress. From what I have read the path being set should point towards some service account key json file which I tried to google but did not understand about it. Do I need to download this service account key json file or make it by myself using some python program? – Praveen Iyer Jun 13 '19 at 10:28
  • First set up a billing account (go to [billing console](https://console.cloud.google.com/billing/), set up a new billing account, fill up your credit card details). Then, go to [api and credentials page](https://console.cloud.google.com/projectselector2/apis/credentials). Select a project or create new. Click on "create credentials". Click on "Service account key". On the next page, create a new service account, with role "Project->Owner". Click on Json and Create. This will download the credentials file. – Shubham Agrawal Jun 13 '19 at 11:04
  • I think then I will drop this method since I cannot pay for this. Thankyou so much for your time and efforts so far. – Praveen Iyer Jun 13 '19 at 14:01
  • Well! It's not actually paid. For the first few months, the API is free. And even after that, your credit card will not be charged. They just need billing information to verify you are a legit person. – Shubham Agrawal Jun 13 '19 at 16:10
  • Oh I see. Even for the billing information I do not own a credit card. and besides i figured another way to get this translation task done which is I will extract all phrases and print them in a .txt file then I will open the .txt file in chrome and translate it to hindi and then replace the phrases in the json file and then I will be done. Sounds feasible right? – Praveen Iyer Jun 14 '19 at 06:40