3

I have a Google Cloud Run app run on GCP. After adding the following modification, the app crashed:

nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

I am suspicious that I need to manually install these files at the GCP server-side, but I don't know how to do so.

The log of the Cloud Run is as following, I don't know how to download resource for the Cloud Run, normally I use the App Engine, there's a cluster run behind. I have no idea how to debug this. Please help!

19-11-01 18:08:00.524 PDT Resource [93mpunkt[0m not found.
2019-11-01 18:08:00.524 PDT Please use the NLTK Downloader to obtain the resource:
2019-11-01 18:08:00.524 PDT
2019-11-01 18:08:00.524 PDT [31m>>> import nltk
2019-11-01 18:08:00.524 PDT >>> nltk.download('punkt')
2019-11-01 18:08:00.524 PDT [0m
2019-11-01 18:08:00.524 PDT For more information see: https://www.nltk.org/data.html
2019-11-01 18:08:00.524 PDT
2019-11-01 18:08:00.524 PDT Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m
2019-11-01 18:08:00.524 PDT
2019-11-01 18:08:00.524 PDT Searched in:
2019-11-01 18:08:00.524 PDT - '/home/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/local/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/local/share/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/local/lib/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/share/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/local/share/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/lib/nltk_data'
2019-11-01 18:08:00.524 PDT - '/usr/local/lib/nltk_data'
2019-11-01 18:08:00.524 PDT - ''
2019-11-01 18:08:00.524 PDT**********************************************************************
2019-11-01 18:08:00.524 PDT
2019-11-01 18:08:00.527 PDTPOST500498 B2.2 sChrome 78 https://model-zsairbvdca-uc.a.run.app/upload/documents
2019-11-01 18:08:01.329 PDTGET404437 B4 msChrome 78 https://model-zsairbvdca-uc.a.run.app/favicon.ico

Please help.

Bill Chen
  • 1,699
  • 14
  • 24
  • I suppose you're adding these lines into your main.py file. Can you provide a reproducible code snippet, the app.yaml file and requirements.txt, in order to try to reproduce it? Also can you attach the error it's giving to you? Have you tried to run it locally? – sotis Nov 01 '19 at 11:28
  • There are few details in your question. How big is the file `words` are you running out of memory? Review Stackdriver logs for error messages. – John Hanley Nov 01 '19 at 13:43
  • @sotis would you mind having a look at my modified question? thanks – Bill Chen Nov 02 '19 at 04:36
  • Thanks for updating the answer! [Here](https://stackoverflow.com/questions/47036793/install-nltk-on-docker) there is a similar case which should resolve your issue. You can try this one as well [nltk.download](https://stackoverflow.com/questions/43182131/docker-download-all-from-nltk-in-dockerfile/43182517). In case you're still having problems, can you add your dockerfile? – sotis Nov 04 '19 at 11:38

2 Answers2

3

This is what I actually did and it worked. Since I am using Docker and GCP, I need to update the Dockerfile so the GCP knows how to construct the image:

RUN python -m nltk.downloader all -d /usr/local/nltk_data

The log I got from GCP is like this:

2019-11-08 18:39:32.900 PST Attempted to load [93mtokenizers/punkt/PY3/english.pickle[0m
2019-11-08 18:39:32.900 PST
2019-11-08 18:39:32.900 PST Searched in:
2019-11-08 18:39:32.900 PST - '/home/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/local/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/local/share/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/local/lib/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/share/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/local/share/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/lib/nltk_data'
2019-11-08 18:39:32.900 PST - '/usr/local/lib/nltk_data'
2019-11-08 18:39:32.900 PST - ''

And the real problem in here is not where to put the nltk data, instead it is the GCP won't re-initiate the docker image frequently enough, so I have to manually push the new docker image to the GCP image repo, and deploy it from the Cloud Run!

Bill Chen
  • 1,699
  • 14
  • 24
0

The nltk.downloader command will try to open the NLTK downloader GUI, which won't work on App Engine.

Instead, you'll need to follow the "manual installation" instructions: https://www.nltk.org/data.html#manual-installation

Dustin Ingram
  • 20,502
  • 7
  • 59
  • 82