2

I looked at the book and made the code as it was in the book. By the way, I have the following error. What should I do?

from nltk.stem import PorterStemmer, WordNetLemmatizer

sent = 'The laughs you two heard were triggered by memories 
            of his own high j-flying exits for moving beasts'

lemmatizer = WordNetLemmatizer()
words = lemmatizer.lemmatize(sent, pos = 'pos')

File "D:/machine_learning/nltk_mapper.py", line 24, in <module>
    word = lemmatizer.lemmatize(words, pos='pos')
  File "D:\machine_learning\venv\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "D:\machine_learning\venv\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1818, in _morphy
    exceptions = self._exception_map[pos]
KeyError: 'pos'

The original result value is to print only meaningful words as follows:

  ['The', 'laugh', 'two', 'hear', 'trigger', 
   'memory', 'high', 'fly', 'exit', 'move', 'beast']

Thank you


I've solved it. I referenced the following url. NLTK: lemmatizer and pos_tag

from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
def lemmatize_all(sentence):
    wnl = WordNetLemmatizer()
    for word, tag in pos_tag(word_tokenize(sentence)):
        if tag.startswith("NN"):
            yield wnl.lemmatize(word, pos='n')
        elif tag.startswith('VB'):
            yield wnl.lemmatize(word, pos='v')
        elif tag.startswith('JJ'):
            yield wnl.lemmatize(word, pos='a')
        # else:
        #     yield word

print(' '.join(lemmatize_all('The laughs you two heard were triggered by memories of his own high j-flying exits for moving beasts')))

result --> laugh heard be trigger memory own high j-flying exit move beast

thank you

susim
  • 221
  • 5
  • 15
  • BTW, which chapter of the book did you find that code snippet? Or is it from another book that's not https://www.nltk.org/book/ch03.html ? – alvas Sep 14 '18 at 00:49
  • Yes, I read a book 'Advanced Machine Learning with Python' – susim Sep 14 '18 at 02:49

1 Answers1

1

The purpose of Lemmatisation is to group together different inflected forms of a word, called lemma. For example, a lemmatiser should map gone, going and went into go. Thus we have to lemmatize each word separately.

from nltk.stem import PorterStemmer, WordNetLemmatizer

sent = 'The laughs you two heard were triggered by memories of his own high j-flying exits for moving beasts'
sent_tokenized = sent.split(" ")
lemmatizer = WordNetLemmatizer()
words = [lemmatizer.lemmatize(word) for word in sent_tokenized]
MNA
  • 183
  • 2
  • 8
  • Thank you. Just like the way I mentioned, I'd like to extract meaningful words. Is there any way? – susim Sep 14 '18 at 02:52
  • ['The', 'laugh', 'two', 'hear', 'trigger', 'memory', 'high', 'fly', 'exit', 'move', 'beast'] – susim Sep 14 '18 at 02:52
  • for that you have to remove Stopwords. for removing Stopwords refer to this [link](https://stackoverflow.com/questions/29523254/python-remove-stop-words-from-pandas-dataframe). – MNA Sep 14 '18 at 06:07
  • thank you very much – susim Sep 14 '18 at 07:22