Extract main word from its descendants with python

Question

Would like to know if there is a way to extract the main word out of descendants,

ex:

recruitment -> recruit
recruiter -> recruit
recruited -> recruit

I got the last one using wordnet lemmatizer, like this:

from nltk.stem.wordnet import WordNetLemmatizer
lmtzr = WordNetLemmatizer()
lmtzr.lemmatize('recruited', 'v')

can't seem to find a solution for the others, is there a library for that or should I code a function.

See http://stackoverflow.com/questions/17317418/stemmers-vs-lemmatizers — alvas, Mar 16 '16 at 17:23
There are some good morphological analyzers available online like Morfessor http://www.cis.hut.fi/projects/morpho/index.shtml. — Riyaz, Mar 17 '16 at 06:30
interesting tool. Not really what I am looking for though. I think I'll develop something on my own — CodeBird, Mar 17 '16 at 14:35

Till · Answer 1 · 2016-03-16T17:22:04.737

2

I think you are talking about stemming :

http://www.nltk.org/api/nltk.stem.html

A processing interface for removing morphological affixes from words. This process is known as stemming.

from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
st.stem('recruitment')
st.stem('recruiter')
st.stem('recruited')

edited Mar 16 '16 at 17:22

answered Mar 16 '16 at 17:19

Till

4,183
3
16
18

No I know stemming, but I actually need the real english word. For example stemming `conclusion` with snowball returns `conclus` which is not what I desire – CodeBird Mar 16 '16 at 17:20
Ok I understand. I don't know if there is something like this but you could check this link : https://github.com/vinta/awesome-python#text-processing – Till Mar 16 '16 at 17:23
Thanks for trying to help, lancaster was a good start, `conclusion` became `conclud` which then I passed it to `en.suggest('conclud')` which gave `conclude` but didn't work for `revision` which became `revid` – CodeBird Mar 16 '16 at 17:51

score 2 · Answer 2 · answered Mar 16 '16 at 17:23

2

Try LancasterStemmer from nltk

import nltk 
lancaster = nltk.LancasterStemmer()

print lancaster.stem("recruitment")
print lancaster.stem("recruiter")
print lancaster.stem("recruited")

answered Mar 16 '16 at 17:23

Yunhe

665
5
10

Extract main word from its descendants with python

2 Answers2

Linked