Why doesn't stemming\ lemmatization in nltk perform accurately on pandas dataframe?

Question

I am trying to implement stemming and lemmatization from nltk package on a Pandas dataframe. I wrote the following function but somewhere it is not performing the stemming and lemmatization. Please let me know the changes required to be made.

from nltk.tokenize import RegexpTokenizer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.stem import LancasterStemmer

stemmer=LancasterStemmer()
lemmer=WordNetLemmatizer()

alpha_tokenizer=RegexpTokenizer('[A-Za-z]\w+')
def process_sentence(words):
    words=words.lower()
    tokens=alpha_tokenizer.tokenize(words)
    for index,word in enumerate(tokens):
        tokens[index]=stemmer.stem(word)
        tokens[index]=lemmer.lemmatize(word,'v')
        tokens[index]=lemmer.lemmatize(word,'n')

    return tokens

print([process_sentence(item) for item in ['abaci', 'happy dogs']])

# [['abacus'], ['happy', 'dog']]

" I wrote the following function but somewhere it is not performing the work it is intended to so." What, exactly isn't working? — juanpa.arrivillaga, Aug 22 '18 at 02:34
Thanks for pointing out. It is not performing stemming and lemmatization. Also, I will make the edit. — Bhaskar Dhariyal, Aug 22 '18 at 02:40
i've edited your question to make it runnable, and it seems to be working OK for me ... what is your desired output? — maxymoo, Aug 22 '18 at 05:42
Words like 'working', 'worked', 'updated' are not getting to lemma. — Bhaskar Dhariyal, Aug 22 '18 at 13:47
Have you tried https://stackoverflow.com/questions/49354691/nltk-how-to-lemmatize-taking-surrounding-words-into-context ? — alvas, Aug 23 '18 at 04:03

Why doesn't stemming\ lemmatization in nltk perform accurately on pandas dataframe?

0 Answers0