2

Would like to know if there is a way to extract the main word out of descendants,

ex:

recruitment -> recruit
recruiter -> recruit
recruited -> recruit

I got the last one using wordnet lemmatizer, like this:

from nltk.stem.wordnet import WordNetLemmatizer
lmtzr = WordNetLemmatizer()
lmtzr.lemmatize('recruited', 'v')

can't seem to find a solution for the others, is there a library for that or should I code a function.

CodeBird
  • 3,883
  • 2
  • 20
  • 35

2 Answers2

2

I think you are talking about stemming :

http://www.nltk.org/api/nltk.stem.html

A processing interface for removing morphological affixes from words. This process is known as stemming.

from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
st.stem('recruitment')
st.stem('recruiter')
st.stem('recruited')
Till
  • 4,183
  • 3
  • 16
  • 18
  • No I know stemming, but I actually need the real english word. For example stemming `conclusion` with snowball returns `conclus` which is not what I desire – CodeBird Mar 16 '16 at 17:20
  • Ok I understand. I don't know if there is something like this but you could check this link : https://github.com/vinta/awesome-python#text-processing – Till Mar 16 '16 at 17:23
  • Thanks for trying to help, lancaster was a good start, `conclusion` became `conclud` which then I passed it to `en.suggest('conclud')` which gave `conclude` but didn't work for `revision` which became `revid` – CodeBird Mar 16 '16 at 17:51
2

Try LancasterStemmer from nltk

import nltk 
lancaster = nltk.LancasterStemmer()

print lancaster.stem("recruitment")
print lancaster.stem("recruiter")
print lancaster.stem("recruited")
Yunhe
  • 665
  • 5
  • 10