I am trying out nltk tutorial.
The problem I was facing was that it requires to download various corpora. After all solutions failed to solve the problem I was facing to download nltk corpora with nltk.download()
, I resorted to steps stated here.
I started downloading corpora required for any example from this page, putting it in directory D:\nltk_data\corpora
. I was able to try out various example. But then at one example I got error :
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
So I downloaded punkt from same page and copy pasted in same above directory. But it did not worked. Also tried to do from nltk.corpus import punkt
as in case of other corpora. But no use. It says Unresolved import: punkt
One difference in punkt from other corpora is that it contains pickle files instead of text files as in case of other corpora. How should I fix this?
Code:
import nltk;
from nltk.corpus import gutenberg
for fileid in gutenberg.fileids():
num_chars = len(gutenberg.raw(fileid))
num_words = len(gutenberg.words(fileid))
num_sents = len(gutenberg.sents(fileid))
num_vocab = len(set(w.lower() for w in gutenberg.words(fileid)))
print(round(num_chars/num_words), round(num_words/num_sents), round(num_words/num_vocab), fileid)
Error:
Traceback (most recent call last):
File "D:\Mahesh\workspaces\pyworkspace\nltkdemo\chp2\chp2.py", line 8, in <module>
num_sents = len(gutenberg.sents(fileid))
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\corpus\reader\util.py", line 233, in __len__
for tok in self.iterate_from(self._toknum[-1]): pass
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\corpus\reader\util.py", line 296, in iterate_from
tokens = self.read_block(self._stream)
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\corpus\reader\plaintext.py", line 129, in _read_sent_block
for sent in self._sent_tokenizer.tokenize(para)])
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 984, in __getattr__
self.__load()
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 976, in __load
resource = load(self._path)
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 836, in load
opened_resource = _open(resource_url)
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 954, in _open
return find(path_, path + ['']).open()
File "D:\Softwares\python\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\nltk\data.py", line 675, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
Searched in:
- 'C:\\Users\\593932/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'D:\\Softwares\\python\\WinPython-64bit-3.4.4.4Qt5\\python-3.4.4.amd64\\nltk_data'
- 'D:\\Softwares\\python\\WinPython-64bit-3.4.4.4Qt5\\python-3.4.4.amd64\\share\\nltk_data'
- 'D:\\Softwares\\python\\WinPython-64bit-3.4.4.4Qt5\\python-3.4.4.amd64\\lib\\nltk_data'
- 'C:\\Users\\Mahesha999\\AppData\\Roaming\\nltk_data'
- ''
**********************************************************************
The error seem to happen at line 8: num_sents = len(gutenberg.sents(fileid))