how to resolve the error: AttributeError: 'generator' object has no attribute 'endswith'

Question

When I'm trying to run this code to preprocess a text, I get the error below, someone is having a similar problem but the post did not have enough details.

I am putting everything in context here hoping to help reviewer to help us better.

Here is the function;

def preprocessing(text):
    #text=text.decode("utf8")
    #tokenize into words
    tokens=[word for sent in nltk.sent_tokenize(text) for word in 
    nltk.word_tokenize(sent)]
    #remove stopwords
    stop=stopwords.words('english')
    tokens=[token for token in tokens if token not in stop]
    #remove words less than three letters
    tokens=[word for word in tokens if len(word)>=3]
    #lower capitalization
    tokens=[word.lower() for word in tokens]
    #lemmatization
    lmtzr=WordNetLemmatizer()
    tokens=[lmtzr.lemmatize(word for word in tokens)]
    preprocessed_text=' '.join(tokens)
    return preprocessed_text

calling the function here;

#open the text data from disk location
sms=open('C:/Users/Ray/Documents/BSU/Machine_learning/Natural_language_Processing_Pyhton_And_NLTK_Chap6/smsspamcollection/SMSSpamCollection')
sms_data=[]
sms_labels=[]
csv_reader=csv.reader(sms,delimiter='\t')
for line in csv_reader:
    #adding the sms_id
    sms_labels.append(line[0])
    #adding the cleaned text by calling the preprocessing method
    sms_data.append(preprocessing(line[1]))
sms.close()

result;

--------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-38-b42d443adaa6> in <module>()
      8     sms_labels.append(line[0])
      9     #adding the cleaned text by calling the preprocessing method
---> 10     sms_data.append(preprocessing(line[1]))
     11 sms.close()

<ipython-input-37-69ef4cd83745> in preprocessing(text)
     12     #lemmatization
     13     lmtzr=WordNetLemmatizer()
---> 14     tokens=[lmtzr.lemmatize(word for word in tokens)]
     15     preprocessed_text=' '.join(tokens)
     16     return preprocessed_text

~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
     38 
     39     def lemmatize(self, word, pos=NOUN):
---> 40         lemmas = wordnet._morphy(word, pos)
     41         return min(lemmas, key=len) if lemmas else word
     42 

~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in
_morphy(self, form, pos, check_exceptions)    1798     1799         # 1. Apply rules once to the input to get y1, y2, y3, etc.
-> 1800         forms = apply_rules([form])    1801     1802         # 2. Return all that are in the database (and check the original too)

~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms)    1777         def apply_rules(forms):    1778     return [form[:-len(old)] + new
-> 1779                     for form in forms    1780                     for old, new in substitutions    1781                     if form.endswith(old)]

~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0)    1779                     for form in forms    1780   for old, new in substitutions
-> 1781                     if form.endswith(old)]    1782     1783         def filter_forms(forms):

AttributeError: 'generator' object has no attribute 'endswith'

I believe the error is coming from the source code for nltk.corpus.reader.wordnet

The whole source code can be seen in the nltk documentation page. It's too long to post here; but below is the raw link:

Thanks for your help.

You're passing a generator here: `tokens=[lmtzr.lemmatize(word for word in tokens)]` - does this method actually accept a generator? did you mean `tokens=[lmtzr.lemmatize(word) for word in tokens]` — match, Jan 19 '18 at 12:23
Heh, please see this https://stackoverflow.com/questions/47769818/why-is-my-nltk-function-slow-when-processing-the-dataframe/47788736#47788736 and understand why your preprocessing is sub-optimal because it's iterating through the tokens multiple times... Could I ask where did you get this code sample from? This code sample is haunting people and you're not the first one to ask a question based on this. — alvas, Jan 19 '18 at 12:49
Yes the code is from chapter6 (text classification) of the book "Natural Language Processing Python And NLTK" by Hardeniya Et Al. — alpha5401, Jan 20 '18 at 02:43
Yes the code is from chapter6 (text classification) of the book "Natural Language Processing Python And NLTK" by Hardeniya Et Al. — alpha5401, Jan 20 '18 at 02:44

bruno desthuilliers · Accepted Answer · 2018-01-19T12:34:03.023

2

The error message and traceback points you to the source of the problem:

in preprocessing(text) 12 #lemmatization 13 lmtzr=WordNetLemmatizer() ---> 14 tokens=[lmtzr.lemmatize(word for word in tokens)] 15 preprocessed_text=' '.join(tokens) 16 return preprocessed_text

~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos) 38 39 def lemmatize(self, word, pos=NOUN):

Obviously, from the function's signature (word, not words) and the error ("has no attribute 'endswith'" - endswith() is actually a str method), lemmatize() expects a single word, but here:

tokens=[lmtzr.lemmatize(word for word in tokens)]

you are passing a generator expression.

What you want is:

tokens = [lmtzr.lemmatize(word) for word in tokens]

NB : you mentions:

I believe the error is coming from the source code for nltk.corpus.reader.wordnet

The error is indeed raised in this package, but it "is coming from" (in the sense of "caused by") your code passing the wrong argument ;)

Hope this will help you debug this kind of problems by yourself next time.

edited Jan 19 '18 at 12:34

answered Jan 19 '18 at 12:26

bruno desthuilliers

75,974
6
88
118

`str.endswith()` accepts a tuple, would that work here like `lmtzr.lemmatize(tuple(tokens))]` ? – Chris_Rands Jan 19 '18 at 12:43
@Chris_Rands obviously not. Reread the call stack and look how `str.endswith()` is called (hint: it's a method called on a str instance - actually the word passed to `.lemmatize()`). If you passed a `tuple`, `endswith()` would be called on that tuple (just as it's been called on the generator in the OP's code), and raised the same AttributeError because `tuple` has no attribute `endswith()` either. TL;DR : `.lemmatize()` expects a string, give it a string, else it will fail. – bruno desthuilliers Jan 19 '18 at 14:39
Ah yes, point taken, I should have looked more carefully, agreed! – Chris_Rands Jan 19 '18 at 14:42
Thank you Bruno for the input, the problem was the argument. – alpha5401 Jan 20 '18 at 02:37

how to resolve the error: AttributeError: 'generator' object has no attribute 'endswith'

calling the function here;

1 Answers1