1

I am trying to implement stemming and lemmatization from nltk package on a Pandas dataframe. I wrote the following function but somewhere it is not performing the stemming and lemmatization. Please let me know the changes required to be made.

from nltk.tokenize import RegexpTokenizer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.stem import LancasterStemmer

stemmer=LancasterStemmer()
lemmer=WordNetLemmatizer()

alpha_tokenizer=RegexpTokenizer('[A-Za-z]\w+')
def process_sentence(words):
    words=words.lower()
    tokens=alpha_tokenizer.tokenize(words)
    for index,word in enumerate(tokens):
        tokens[index]=stemmer.stem(word)
        tokens[index]=lemmer.lemmatize(word,'v')
        tokens[index]=lemmer.lemmatize(word,'n')

    return tokens

print([process_sentence(item) for item in ['abaci', 'happy dogs']])

# [['abacus'], ['happy', 'dog']]
maxymoo
  • 35,286
  • 11
  • 92
  • 119
Bhaskar Dhariyal
  • 1,343
  • 2
  • 13
  • 31

0 Answers0