IndexError found but can't find the problem

Question

I was try this code

import OS
import nltk
print(os.listdir(nltk.data.find("corpora")))

but following error showed up.

------------------------------------------------------------------------- 
--
IndexError                                Traceback (most recent call 
last)
<ipython-input-2-9f8c46ee9865> in <module>()
----> 1 print(os.listdir(nltk.data.find("corpora")))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\data.py in 
find(resource_name, paths)
    676 
    677     # Identify the package (i.e. the .zip file) to download.
--> 678     resource_zipname = resource_name.split('/')[1]
    679     if resource_zipname.endswith('.zip'):
    680         resource_zipname = resource_zipname.rpartition('.')[0]

IndexError: list index out of range

My guess is that `resource_name.split('/')` returns a list of a single element, and you're trying to access element 2 (index 1). The `resource_name` probably doesn't have the `'/'` character. — Ted Klein Bergman, May 11 '19 at 09:23
The library seems to expect something with a `/` as the argument. — Klaus D., May 11 '19 at 09:27
The argument to `nltk.data.find()` is supposed to be a pathname to the file containing the corpus, not just a filename. — Barmar, May 11 '19 at 09:28
See the examples [here](http://www.nltk.org/howto/data.html) — Barmar, May 11 '19 at 09:28
I have watched a tutorial and the instructor was getting the list of files in the corpora with the exact same line of code — Pawan, May 11 '19 at 09:36

alvas · Answer 1 · 2019-05-12T16:05:18.103

4

TL;DR

You have to first download the corpora.

>>> import os
>>> import nltk
>>> nltk.download('popular')
>>> print(os.listdir(nltk.data.find("corpora")))

But the printing of what's inside the corpora directory don't help much, maybe hints from this would be more helpful: https://stackoverflow.com/a/30822962/610569

edited May 12 '19 at 16:05

answered May 12 '19 at 15:59

alvas

115,346
109
446
738

Shirin Yavari · Answer 2 · 2019-09-11T17:43:25.360

1

You need to add 'from future import print_function' to your code, so use the following:

from __future__ import print_function
import os
import nltk
import nltk.corpus
nltk.download('popular')
print(os.listdir(nltk.data.find("corpora")))

edited Sep 11 '19 at 17:43

answered Sep 11 '19 at 17:24

Shirin Yavari

626
4
6

Sina Parchami · Answer 3 · 2022-03-09T09:46:13.077

1

you have to Download the Data first.

See: https://www.nltk.org/data.html

Import nltk library and downlowd the dataset you need:

import nltk
nltk.download()

To test that the data has been installed:

from nltk.corpus import brown
print(", ".join(brown.words()))

The, Fulton, County, Grand, Jury, said, ...

Here assumes you downloaded the Brown Corpus. See a list of the available corpus here: https://www.nltk.org/nltk_data/

edited Mar 09 '22 at 09:46

answered Feb 04 '20 at 15:28

Sina Parchami

11
2

1

use nltk.download('all') – KawaiKx Jun 16 '21 at 05:53

IndexError found but can't find the problem

3 Answers3

TL;DR