Remove stopwords with nltk.corpus from list with lists

Question

I have a list containing lists with all seperated words of a review, that looks like this:

texts = [['fine','for','a','night'],['it','was','good']]

I want to remove all stopwords, using the nltk.corpus package, and put all the words without stopwords back into the list. The end results should be a list, consisting of a lists of words without stopwords. This it was I tried:

import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]

for review in texts:
    wr=[]
    for word in review:
        if word not in stopwords:
            wr.append(word)
        words_reviews.append(wr)

This code actually worked, but now I get the error: AttributeError: 'list' object has no attribute 'words', referring to stopwords. I made sure that I installed all packages. What could be the problem?

Possible duplicate: http://stackoverflow.com/questions/19130512/stopword-removal-with-nltk — alvas, Apr 01 '17 at 17:46

score 4 · Accepted Answer · answered Mar 31 '17 at 13:55

The problem is that you redefine stopwords in your code:

from nltk.corpus import stopwords
stopwords=stopwords.words('english')

After the first line, stopwords is a corpus reader with a words() method. After the second line, it is a list. Proceed accordingly.

Actually looking things up in a list is really slow, so you'll get much better performance if you use this:

stopwords = set(stopwords.words('english'))

score 0 · Answer 2 · answered May 27 '20 at 17:35

0

instead of

[word for word in text_tokens if not word in stopwords.words()]

use

[word for word in text_tokens if not word in all_stopwords]

answered May 27 '20 at 17:35

Niranjan Mangotri

31
3

score 0 · Answer 3 · answered Jun 20 '22 at 09:37

0

i removed the set , it worked, may be you could try the same

answered Jun 20 '22 at 09:37

WARUTS

72
7

Remove stopwords with nltk.corpus from list with lists

3 Answers3