WordListCorpusReader is not iterable

Question

So, I am new to using Python and NLTK. I have a file called reviews.csv which consists of comments extracted from amazon. I have tokenized the contents of this csv file and written it to a file called csvfile.csv. Here's the code :

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import PorterStemmer
import csv #CommaSpaceVariable
from nltk.corpus import stopwords
ps = PorterStemmer()
stop_words = set(stopwords.words("english"))
with open ('reviews.csv') as csvfile:
    readCSV = csv.reader(csvfile,delimiter='.')    
    for lines in readCSV:
        word1 = word_tokenize(str(lines))
        print(word1)
    with open('csvfile.csv','a') as file:
        for word in word1:
            file.write(word)
            file.write('\n')
    with open ('csvfile.csv') as csvfile:
        readCSV1 = csv.reader(csvfile)
    for w in readCSV1:
        if w not in stopwords:
            print(w)

I am trying to perform stemming on csvfile.csv. But I get this error:

  Traceback (most recent call last):<br>
  File "/home/aarushi/test.py", line 25, in <module> <br>
   if w not in stopwords: <br>
  TypeError: argument of type 'WordListCorpusReader' is not iterable

Also, is there really a need to write a file, only to read it back and print it? — OneCricketeer, Oct 28 '17 at 05:43
1. I wrote stop_words instead of stopwords. Now I have another error. TypeError: unhashable type: 'list' 2. I wanted the word_tokenized file. That's why I did that. — Aarushi Aiyyar, Oct 28 '17 at 07:10
1. Each stackoverflow question should be about one problem. When you move on to the next problem, ask a new question. 2. How could anyone guess where your new error came from? You haven't posted the code. (But don't edit the question or post it in a comment: Ask a new question if you are still stuck.) My guess is you're trying to create a set from the wrong kind of data... — alexis, Oct 28 '17 at 12:41
You should also make a [mcve]... Your code errors on the first line that isn't importing or declaring a class from a third party library. In other words, you seem not to be testing your code each time you add functionality. You ask about the stemming, but the error is on the stop words. See here for what you're trying to do at the start https://stackoverflow.com/a/19133088/2308683 — OneCricketeer, Oct 28 '17 at 18:48

score 36 · Accepted Answer · answered Oct 30 '17 at 02:45

When you did

from nltk.corpus import stopwords

stopwords is the variable that's pointing to the CorpusReader object in nltk.

The actual stopwords (i.e. a list of stopwords) you're looking for is instantiated when you do:

stop_words = set(stopwords.words("english"))

So when checking whether a word in your list of tokens is a stopwords, you should do:

from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))
for w in tokenized_sent:
    if w not in stop_words:
        pass # Do something.

To avoid confusion, I usually name the actual list of stopwords as stoplist:

from nltk.corpus import stopwords
stoplist = set(stopwords.words("english"))
for w in tokenized_sent:
    if w not in stoplist:
        pass # Do something.

Thank you! I modified stopwords to stop_words but I get an error saying: if w not in stop_words: TypeError: unhashable type: 'list' — Aarushi Aiyyar, Nov 02 '17 at 14:28
@alvas, i face with this erro : NameError: name 'tokenized_sent' is not defined — Mohammad Heydari, Jun 13 '19 at 16:20

WordListCorpusReader is not iterable

1 Answers1

Linked