1

I was try this code

import OS
import nltk
print(os.listdir(nltk.data.find("corpora")))

but following error showed up.

------------------------------------------------------------------------- 
--
IndexError                                Traceback (most recent call 
last)
<ipython-input-2-9f8c46ee9865> in <module>()
----> 1 print(os.listdir(nltk.data.find("corpora")))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\data.py in 
find(resource_name, paths)
    676 
    677     # Identify the package (i.e. the .zip file) to download.
--> 678     resource_zipname = resource_name.split('/')[1]
    679     if resource_zipname.endswith('.zip'):
    680         resource_zipname = resource_zipname.rpartition('.')[0]

IndexError: list index out of range
Barmar
  • 741,623
  • 53
  • 500
  • 612
Pawan
  • 51
  • 3
  • My guess is that `resource_name.split('/')` returns a list of a single element, and you're trying to access element 2 (index 1). The `resource_name` probably doesn't have the `'/'` character. – Ted Klein Bergman May 11 '19 at 09:23
  • The library seems to expect something with a `/` as the argument. – Klaus D. May 11 '19 at 09:27
  • The argument to `nltk.data.find()` is supposed to be a pathname to the file containing the corpus, not just a filename. – Barmar May 11 '19 at 09:28
  • See the examples [here](http://www.nltk.org/howto/data.html) – Barmar May 11 '19 at 09:28
  • I have watched a tutorial and the instructor was getting the list of files in the corpora with the exact same line of code – Pawan May 11 '19 at 09:36

3 Answers3

4

TL;DR

You have to first download the corpora.

>>> import os
>>> import nltk
>>> nltk.download('popular')
>>> print(os.listdir(nltk.data.find("corpora")))

But the printing of what's inside the corpora directory don't help much, maybe hints from this would be more helpful: https://stackoverflow.com/a/30822962/610569

alvas
  • 115,346
  • 109
  • 446
  • 738
1

You need to add 'from future import print_function' to your code, so use the following:

from __future__ import print_function
import os
import nltk
import nltk.corpus
nltk.download('popular')
print(os.listdir(nltk.data.find("corpora")))
Shirin Yavari
  • 626
  • 4
  • 6
1

you have to Download the Data first.

See: https://www.nltk.org/data.html

Import nltk library and downlowd the dataset you need:

import nltk
nltk.download()

To test that the data has been installed:

from nltk.corpus import brown
print(", ".join(brown.words()))

The, Fulton, County, Grand, Jury, said, ...

Here assumes you downloaded the Brown Corpus. See a list of the available corpus here: https://www.nltk.org/nltk_data/