Define a function called performStemAndLemma
, which takes a parameter. The first parameter, textcontent
, is a string. The function definition code stub is given in the editor. Perform the following specified tasks:
1.Tokenize all the words given in textcontent
. The word should contain alphabets or numbers or underscore. Store the tokenized list of words in tokenizedwords
. (Hint: Use regexp_tokenize)
Convert all the words into lowercase. Store the result into the variable
tokenizedwords
.Remove all the stop words from the unique set of
tokenizedwords
. Store the result into the variablefilteredwords
. (Hint: Use stopwords corpora)Stem each word present in
filteredwords
with PorterStemmer, and store the result in the listporterstemmedwords
.Stem each word present in
filteredwords
with LancasterStemmer, and store the result in the listlancasterstemmedwords
.Lemmatize each word present in
filteredwords
with WordNetLemmatizer, and store the result in the listlemmatizedwords
.
Return porterstemmedwords
, lancasterstemmedwords
, lemmatizedwords
variables from the function.
My code:
from nltk.corpus import stopwords
def performStemAndLemma(textcontent):
# Write your code here
#Step 1
tokenizedword = nltk.tokenize.regexp_tokenize(textcontent, pattern = '\w*', gaps = False)
#Step 2
tokenizedwords = [x.lower() for x in tokenizedword if x != '']
#Step 3
unique_tokenizedwords = set(tokenizedwords)
stop_words = set(stopwords.words('english'))
filteredwords = []
for x in unique_tokenizedwords:
if x not in stop_words:
filteredwords.append(x)
#Steps 4, 5 , 6
ps = nltk.stem.PorterStemmer()
ls = nltk.stem.LancasterStemmer()
wnl = nltk.stem.WordNetLemmatizer()
porterstemmedwords =[]
lancasterstemmedwords = []
lemmatizedwords = []
for x in filteredwords:
porterstemmedwords.append(ps.stem(x))
lancasterstemmedwords.append(ls.stem(x))
lemmatizedwords.append(wnl.lemmatize(x))
return porterstemmedwords, lancasterstemmedwords, lemmatizedwords
Still the program is not working fine. Not passing the 2 test cases. Highlight the mistake in above code and provide alternate solution for the same.