8

Generally, in natural language processing, we want to get the lemma of a token.

For example, we can map 'eaten' to 'eat' using wordnet lemmatization.

Is there any tools in python that can inverse lemma to a certain form?

For example, we map 'go' to 'gone' given target form 'eaten'.

PS: Someone mentions we have to store such mappings. How to un-stem a word in Python?

Shifeng.Liu
  • 105
  • 2
  • 7
  • 1
    POS information can also be used to gain certain form of lemma. – Shifeng.Liu Aug 09 '17 at 12:09
  • 1
    How would you know if you map "go" to "gone" or "goes" or "went" ...etc Lemmatization allows for better performance, but it is a tradeoff where you lose some information. If needs be, you'd need to conserve a lemma with a list of possible variation (possibly with their position if you want to recreate a text from those "augmented lemmas") – Adonis Aug 09 '17 at 12:13
  • 1
    @Adonis I would add additional variables as constraints or targets. For example, def inverse_lemma(**args). 'gone' = inverse_lemma(lemma='go', target_form='eaten', target_pos='VBN'). Something like this. – Shifeng.Liu Aug 09 '17 at 23:38

1 Answers1

6

Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization"). Example from Wikipedia:

NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
// output: "The women do not smoke."

Libraries for this are not as frequently used as lemmatizers, which generally means you have fewer options and are less likely to find a well developed library. The Wikipedia example is in Java because the most popular library supporting this is SimpleNLG.

A quick search found pynlg, though it doesn't seem actively maintained. Alternately you can use SimpleNLG via an HTTP JSON interface provided by the Python library nlgserv.

polm23
  • 14,456
  • 7
  • 35
  • 59