0

I have a list containing lists with all seperated words of a review, that looks like this:

texts = [['fine','for','a','night'],['it','was','good']]

I want to remove all stopwords, using the nltk.corpus package, and put all the words without stopwords back into the list. The end results should be a list, consisting of a lists of words without stopwords. This it was I tried:

import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]

for review in texts:
    wr=[]
    for word in review:
        if word not in stopwords:
            wr.append(word)
        words_reviews.append(wr)

This code actually worked, but now I get the error: AttributeError: 'list' object has no attribute 'words', referring to stopwords. I made sure that I installed all packages. What could be the problem?

Lisadk
  • 325
  • 2
  • 6
  • 19

3 Answers3

4

The problem is that you redefine stopwords in your code:

from nltk.corpus import stopwords
stopwords=stopwords.words('english')

After the first line, stopwords is a corpus reader with a words() method. After the second line, it is a list. Proceed accordingly.

Actually looking things up in a list is really slow, so you'll get much better performance if you use this:

stopwords = set(stopwords.words('english'))
alexis
  • 48,685
  • 16
  • 101
  • 161
0

instead of

[word for word in text_tokens if not word in stopwords.words()]

use

[word for word in text_tokens if not word in all_stopwords]

After stopwords.word('english') the type of the file changes and therefore none of the previous attributes will work

0

i removed the set , it worked, may be you could try the same

WARUTS
  • 72
  • 7