8

I have installed the nltk package. Following that I am trying to download the supporting packages using nltk.download() and am getting error:

[Errno 11001] getaddrinfo

My machine / software details are:

OS: Windows 8.1 Python: 3.3.4 NLTK Package: 3.0

Below are the commands run in python:

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.

import nltk

nltk.download()
showing info http://nltk.github.com/nltk_data/
True

nltk.download("all")
[nltk_data] Error loading all: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
False

enter image description here

It looks like it is going to http://nltk.github.com/nltk_data/ whereas it should Ideally try to get the data from http://www.nltk.org/nltk_data/.

On another machine when we type http://nltk.github.com/nltk_data/ in the browser, it redirects to http://www.nltk.org/nltk_data/. I am not understanding why the redirection is not happening on my laptop.

I feel that this might be the issue.

Kindly help.

I have added the command prompt screenshot. Need help..

enter image description here

Regards, Bonson

Bonson
  • 1,418
  • 4
  • 18
  • 38
  • Hello @elyase I do not have http_proxy as a variable. Also this is a home computer so I do not have a firewall. Is there anything specific I should check in the DNS? – Bonson Jan 03 '15 at 11:42

9 Answers9

10

Try below code. It has downloaded package as expected

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

Looks before link was broken whicvh been fixed by ssl.

Note :- MAC been used

Swarit Agarwal
  • 2,520
  • 1
  • 26
  • 33
4

I got this error because of network constraint. Here is how I solved

Browsed http://www.nltk.org/nltk_data/ and downloaded required corpora from the corresponding link.

Then placed the downloaded files in C:/ folder path in windows (or any other relevant directories like C:/ProgramData/Anaconda3) in a same folder structure mentioned in https://github.com/nltk/nltk_data/tree/gh-pages/packages

Avijit Das
  • 133
  • 10
3

Got the solution. The issue in my case was that when the NLTK downloader started it had the server index as - http://nltk.github.com/nltk_data/

This needs to be changed to - http://nltk.org/nltk_data/

You can change this by going into the NLTK Downloader window and the File->Change Server Index.

Regards, Bonson

Bonson
  • 1,418
  • 4
  • 18
  • 38
  • 3
    Hi, i overcame this problem with nltk downloader by changing the server, but how do i do it in code? I am getting [nltk_data] Error loading all: Error while running the code – user3207655 Nov 18 '21 at 19:15
2

it resolved issues for me by "setting http & https proxy in environment variables"

set http_proxy=http://IPN:PWD@ipaddress:port
set https_proxy=https://IPN:PWD@ipaddress:port

ask your network or admin team for this proxy IP address

1

We also have an option to download the packages using python prompt or from within notebooks with following config. It can be http or https based on your proxy settings.

import nltk
nltk.set_proxy('http://username:password@proxy.example.com:port')
Arun
  • 421
  • 3
  • 6
1

I was also facing same problem. Initially I was using broadband(Jio fiber) which might restrict me to download the file(due to security) but then I used mobile internet(through sim card) and it got downloaded and my issue got resolved.

Try the code below to download stopwords or change accordingly :

import nltk

nltk.download('stopwords')

from nltk.corpus import stopwords

stopwords.words('english')
S.B
  • 13,077
  • 10
  • 22
  • 49
0

The Error might be of the proxy that the system has. Refer the following link for the answer, have posted the answer there:

Error in downloading NLTK data: [Errno 11004] getaddrinfo failed

Ranjeet
  • 21
  • 2
0

I was facing this issue on my Jupyter notebook as well. The below code snippet from another stackoverflow answer helped. Just in case it might help someone else -

import socket
socket.getaddrinfo('localhost', 8080)

Ref : "getaddrinfo failed", what does that mean?

Sneha Valabailu
  • 115
  • 2
  • 5
0
#1. I was facing [nltk_data] Error loading punkt: <urlopen error [Errno 11001] issue in Jupyter Notebook
                
#2. To soled it I just change my network form office to Mobile.
#3. This issue came because of restriction to download any module from office n/w
#4. Use below code in Jupyter Notebook
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        import seaborn as sns
        import nltk
        nltk.download('punkt')
            
            [nltk_data] Downloading package punkt to
            [nltk_data]     C:\Users\avinaskh\AppData\Roaming\nltk_data...
            [nltk_data]   Unzipping tokenizers\punkt.zip.
Avinash Khadsan
  • 441
  • 3
  • 6