Lemmatizing whole sentence in python does not work

Question

I am using WordNetLemmatizer() function in NLTK package in python to lemmatize the entire sentence of movie review dataset.

Here is my code:

from nltk.stem import LancasterStemmer, WordNetLemmatizer
lemmer = WordNetLemmatizer()

def preprocess(x):

    #Lemmatization
    x = ' '.join([lemmer.lemmatize(w) for w in x.rstrip().split()])

    # Lower case
    x = x.lower()

    # Remove punctuation
    x = re.sub(r'[^\w\s]', '', x)

    # Remove stop words
    x = ' '.join([w for w in x.split() if w not in stop_words])    
    ## EDIT CODE HERE ## 

    return x

df['review_clean'] = df['review'].apply(preprocess)

review in df is the column of text reviews that I wanted to process

After using the preprocess function on df, the new column review_clean contains cleaned text data but it still does not have lemmatized text. eg. I can see a lot words ends with -ed, -ing.

Thanks in advance.

Try this: https://stackoverflow.com/a/49356358/610569 – alvas Feb 26 '19 at 05:29 — alvas, Feb 26 '19 at 05:29

score 1 · Accepted Answer · answered Feb 23 '19 at 20:38

1

You have to pass 'v' (verb) to lemmatize:

x = ' '.join([lemmer.lemmatize(w, 'w') for w in x.rstrip().split()])

Example:

In [11]: words = ["answered", "answering"]

In [12]: [lemmer.lemmatize(w) for w in words]
Out[12]: ['answered', 'answering']

In [13]: [lemmer.lemmatize(w, 'v') for w in words]
Out[13]: ['answer', 'answer']

answered Feb 23 '19 at 20:38

Andy Hayden

359,921
101
625
535

Thank you Andy, it worked on my end. I have an additional question: We just lemmatized the verbs by passing 'v' to the function. Is it possible to lemmatize all words in one function? For example, I still see nouns in plural forms (eg. 'methods', 'days') in the text after running the lemmatization, – MMAASS Feb 23 '19 at 21:14
@MMAASS hmm, "answers" is de-pluralized, so something is a little strange there. This may be a good new question to ask - specifically about this lemmatize function. – Andy Hayden Feb 23 '19 at 21:35
I simply defined another lemmatization argument in the function and passed 'n' instead of 'v' to it. It worked out. – MMAASS Feb 23 '19 at 22:09

Lemmatizing whole sentence in python does not work

1 Answers1