5

I need to build a php dictionary, which will find the root word of a word. Ex. search "cars", it will tell "Cars is plural of car" Or "took", it's "the past tense of take"

I am considering using Wordnet, but it seems complicated.

Any suggestion? m desperated

Regards;

Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156
jack 101
  • 81
  • 1
  • 7
  • 1
    This is very broad. What aspect of building the dictionary is your question about? Using a 3rd party service might be a good option, as this is likely to become *very* complicated until you have a working solution. – Pekka Mar 27 '11 at 17:02
  • Yes, am also considering 3rd party service like Google Translate and Yahoo Translate. But that would be slow because request be made to Google each time and back to me. and have limitation on request per day like 5000 request per day. I am looking at PSpell and Enchant, hoping this can help me. – jack 101 Mar 27 '11 at 18:00

3 Answers3

5

Well, since suggested stemmer does not work correctly for you, you can choose some, that suits you better from here:

http://snowball.tartarus.org/

Here is also some interesting library: http://sourceforge.net/projects/nlp/

Also links to similiar questions on StackOverflow:

NLP programming tools using PHP?

Text mining with PHP

UPDATE: How do I do word Stemming or Lemmatization?

http://www.reddit.com/r/programming/comments/8e5d3/how_do_i_programatically_do_stemming_eg_eating_to/

http://www.nltk.org/

Wordnet lemmatizer: http://wordnet.princeton.edu/wordnet/download/

Community
  • 1
  • 1
Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156
  • Thanks FratalizeR! i think my problem is with lemmatizer. Stemmer can't help me. It needs dictionary for this to work in irregular case. For example, flies -> fly, taken -> take, mice -> mouse. Only dictionary checkup can interpret this correctly – jack 101 Mar 29 '11 at 04:13
  • Ok, I added some more links for you. – Vladislav Rastrusny Mar 29 '11 at 06:45
1

Well, here is an extension that does word stemming (I beleive this is around what you want): http://pecl.php.net/package/stem

It doesn't do any grammatical analysis of the work, however.

Here is php-only version: http://www.chuggnutt.com/stemmer.php

Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156
  • Thank FractalizeR! That's what i want! After looking at the stemmer, i think a databases of all the words and its relationship(example, take, took->take, taken->take, mice->mouse, cars->car,...) would be faster for me because it just look up in the databases. any advice? – jack 101 Mar 28 '11 at 03:54
  • Given the word "flies", Porterstemmer answer is "fli" and given "taken", porterStemmer gives me "taken"... It seems to work correctly for regular case, but not for irregular case – jack 101 Mar 28 '11 at 04:19
0

You can try the free Lemmatizer API here: http://twinword.com/lemmatizer.php

Scroll down to find the Lemmatizer endpoint.

This will allow you to get "dogs" to "dog", "abilities" to "ability".

If you pass in a POST or GET parameter called "text" with a string like "walked plants":

// These code snippets use an open-source library. http://unirest.io/php
$response = Unirest\Request::post("[ENDPOINT URL]",
  array(
    "X-Mashape-Key" => "[API KEY]",
    "Content-Type" => "application/x-www-form-urlencoded",
    "Accept" => "application/json"
  ),
  array(
    "text" => "walked plants"
  )
);

You get a response like this:

{
  "lemma": {
    "plant": 1,
    "walk": 1
  },
  "result_code": "200",
  "result_msg": "Success"
}
Joseph Shih
  • 1,244
  • 13
  • 25