How can I find all derivations of a word using nltk in python

Asked Jan 09 '16 at 21:16

Active Jan 09 '16 at 21:16

Viewed 1,096 times

I am looking to analyse several (like 30,000 or so) small documents and determine if they include references to a certain subject, like the term "safety". It's easy enough to do a string.find() or tokenize the raw text and compare lists, but I would like the search terms to be dynamic, so if the user types in "safety", my program identifies all forms of the word. So "safety" would compare words like "safe", "safely", "safer", "safest", etc to search the raw text for. My hope would be for the user to put in any term and have a reasonable expectation it will find related terms in the source documents.

I have looked at stemming and lemmatizing, but stemming comes back with some crazy results (ie. "safety" comes back as "safeti" while "safely" stems to "safe") and lemmatizing more often than not returns the given search term. I've tried the two suggestions shown here with same results:

How to list all the forms of a word using NLTK in python

Any help would be appreciated. If all else fails, I'll just build a list of terms at runtime from a user input.

edited May 23 '17 at 10:28

Community

asked Jan 09 '16 at 21:16

James

[This answer](http://stackoverflow.com/questions/771918/how-do-i-do-word-stemming-or-lemmatization/772000#772000) might be useful. – erip Jan 09 '16 at 21:30
use CLIPS pattern library. – alvas Jan 10 '16 at 05:14

How can I find all derivations of a word using nltk in python

0 Answers0