3

I am facing some problem for accessing nltk data. I have tried nltk.download(). The gui page has come with HTTP Error 403: Forbidden error. I have also try to install from command line which is provided here.

python -m nltk.downloader all

and get this error.

C:\Python36\lib\runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) [nltk_data] Error loading all: HTTP Error 403: Forbidden.

I also go through How do I download NLTK data? and Failed loading english.pickle with nltk.data.load.

Mohamed Ali JAMAOUI
  • 14,275
  • 14
  • 73
  • 117
R.A.Munna
  • 1,699
  • 1
  • 15
  • 29

3 Answers3

3

The problem is coming from the nltk download server. If you look at the gui's config, it's pointing to this link

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

If you access this link in the browser, you get this as a message :

Error 403 Forbidden.

Forbidden.

Guru Mediation:

Details: cache-lcy1125-LCY 1501134862 2002107460

Varnish cache server

So, I was going to file an issue on github, but someone else already did that here : https://github.com/nltk/nltk/issues/1791

A workaround was suggested here: https://github.com/nltk/nltk/issues/1787.

Based on the discussion on github:

It seems like the Github is down/blocking access to the raw content on the repo.

The suggested workaround is to manually download as follows:

PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA

People also suggested using an laternative index as follows:

python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt
Mohamed Ali JAMAOUI
  • 14,275
  • 14
  • 73
  • 117
  • I download the data form https://github.com/nltk/nltk_data/archive/gh-pages.zip manually and put into a directory with extracting. Basically which data is need for me and worked fine. Thanks. – R.A.Munna Jul 28 '17 at 06:06
0

Go to /nltk/downloader.py

And change the default url:

DEFAULT_URL = 'http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml'

to

DEFAULT_URL = 'http://nltk.github.com/nltk_data/'

Bowen
  • 1
0

For me the best solution is:

PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA

link

Alternative solution is not working for me

python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt
luminousmen
  • 1,971
  • 1
  • 18
  • 24