-1

I have a random text content in a String variable. I want to look for all word inflections of a specific word user specifies.

Example: If the user is looking for the word "assist" then it should grab all "assist, assists, assisted, assisting" occurrences in the String.

Is there a Java library available to detect such inflections automatically in the specified String?

Note: I have seen a Java library called WolframAlpha that claims it does this and here is its web interface, but i don't see this library working, and no guide is available for using it.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Brad
  • 4,457
  • 10
  • 56
  • 93

3 Answers3

1

First of all it is not Java library, it is Wolfram language previously known as Mathematica. It does have JLink and can be called from Java, but you must have Wolfram Kernel running that executes the code.

This is called Natural Language Processing and it's a huge, complex field. I have fiddled about with few problems, but all I can say this is harder then complex if you want to get reliable solution.

Something you might want to take a look at would be : The Stanford NLP

Margus
  • 19,694
  • 14
  • 55
  • 103
  • @Brad If you just want to get word plural form then decent algorithm is described here http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html (with perl implementation). – Margus Sep 03 '14 at 12:02
  • Actually i want all possible inflections as described in my question and not only the plural form. – Brad Sep 03 '14 at 12:33
0

It is called word stemming. First you need (for a specific language) derive the stem:

assisting -> assist using -ance, -ing, -ly, -s, -ed etcetera.
sought -> search using an exception list

Then do a search, maybe with a regular expression (Matcher.find). Pattern:

"\\bassist\\p{L}*"
"\\b(search|sought)\\p{L}"

For prefixes un- dis- inter- the case would still be more complicated, but in general flections are word endings in English. Then there is synonym searching.

Dictionaries out there are often called corpora. A search for "free English corpus" will yield results.

\\b = word boundary p{L}* = 0 or more (*) letters

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks for the term. That leads to new new nice on Google. I don not want to reinvent the wheel.Surely there is a Java library that already does that ! – Brad Sep 03 '14 at 12:51
  • Okay Lucene is a search engine, http://stackoverflow.com/questions/5391840/stemming-english-words-with-lucene – Joop Eggen Sep 03 '14 at 13:13
-1

Check this out..

I don't know how big your requirement is, but you can always use wiktionary and parse your data??

Check this question.. Can be of help

Community
  • 1
  • 1
Matt
  • 171
  • 3
  • 13
  • Thanks Matt. I have already checked all those posts before. The evo-inflector only gets the word plural form and not all the inflections of the word. I do not know why you have shared wiktionary .. I need a Java library to use it inside my program ! .. The question you have shared talks about plurals also and i have already checked the mentioned WolframAlpha library in that question but i do not see how to use it. – Brad Sep 03 '14 at 11:52