9

What is the most efficient way to get the past tense of a verb, preferably without using memory heavy NLP frameworks?

e.g.

  • live to: lived
  • try to: tried
  • tap to: tapped
  • boil to: boiled
  • sell to: sold

I wrote something quick myself (stack overflow won't let me self answer) which seems to work for regular verbs (e.g. the first 4 of that list), but not irregular verbs: http://pastebin.com/Txh76Dnb

edit: Thanks for all the responses, it looks like it can't be done properly without a dictionary due to irregular verbs.

Richard EB
  • 967
  • 10
  • 24

3 Answers3

11

While I wanted to do this algorithmically without using dictionaries, I had to resort to using one.

I found that the most efficient library was SimpleNLG.

Since their docs are out of sync with the current API, here is how to achieve this:

XMLLexicon lexicon = new XMLLexicon("path\\to\\default-lexicon.xml");
WordElement word = lexicon.getWord("live", LexicalCategory.VERB);
InflectedWordElement infl = new InflectedWordElement(word);
infl.setFeature(Feature.TENSE, Tense.PAST);
Realiser realiser = new Realiser(lexicon);
String past = realiser.realise(infl).getRealisation();
System.out.println(past);
Richard EB
  • 967
  • 10
  • 24
  • how can I do reverse of this? i.e. I'd like to find "play" from "played" keyword. – talha06 Apr 10 '13 at 12:42
  • I don't have the SimpleNLG library on this PC anymore, so I haven't tested it, but I believe it should be as simple as changing the 4th line so that Tense.PAST is Tense.FUTURE and changing the 2nd line so that "live" is "played". – Richard EB Apr 16 '13 at 20:16
  • no it's not.. I tried what you said; but didn't work. gives output of "played" for the input "played". – talha06 Apr 16 '13 at 20:43
  • 2
    In situations like this you'd use `getWordFromVariant` which doesn't rely on a word being in its base form. However, it's worth adding to this that the default Lexicon doesn't know a huge array of words and so won't often work (and indeed doesn't in this case). You'd probably want a larger lexicon: https://code.google.com/p/simplenlg/wiki/AppendixC. – Thom May 12 '14 at 15:11
  • @talha06 in order to convert played to play, what you are trying to do is called "lemmatization" One php library I used to do this is called phpmorphy - hope this helps – Paul Preibisch Feb 18 '15 at 06:14
  • thanks for your suggestion @PaulPreibisch any ideas how to do it using Java? – talha06 Feb 18 '15 at 08:06
  • 1
    @talha06 there are various libraries that can do this, one is Stanford NLP: http://stackoverflow.com/a/9531996/897059 – Richard EB Feb 20 '15 at 02:03
2

One way to go might be to create a dictionary of just irregular verbs (those that don't follow the usual pattern), and then lookup the word first in that. If the word doesn't appear, use your algorithm. Does anyone know the relative numbers of regular vs irregular verbs in English?

stw
  • 119
  • 2
  • 8
  • 2
    http://en.wikipedia.org/wiki/List_of_English_irregular_verbs It doesn't look a lot, until you realise that, as the table demonstrates, a lot of verbs can be compounded and the variety of verbs you can create this way is practically limitless. Special mention goes to "hang" and "lie" which can have both regular and irregular past tenses, depending on their semantics. – biziclop Mar 01 '12 at 17:31
1

Use a dictionary webservice.

DictService is a fairly popular one.

It fetches results from http://www.dict.org. which provides various databases. There is one I found useful "The Collaborative International Dictionary of English v.0.48" which returns word definitions and also it's tenses.

You will have to parse the result somehow to find the past tense.

John Eipe
  • 10,922
  • 24
  • 72
  • 114