The short answer is that a standard NLP library or toolkit is unlikely to solve this problem. Like Stanford NLP, most libraries will only provide a mapping from word --> lemma
. Note that this is a many-to-one function, i.e., the inverse function is not well-defined in a word space. It is, however, a well defined function from the space of words to the space of sets of words (i.e., it's a one-to-many mapping in word-space).
Without some form of explicit mapping being maintained, it is impossible to generate all the variants from a given lemma. This is a theoretical impossibility because lemmatization is a lossy, one-way function.
You can, however, generate a mapping of lemma --> set-of-words
without much coding (and definitely without coding a new algorithm):
// Java
Map<String, Set<String>> inverseLemmaMap = new HashMap<>();
// Guava
Multimap<String, String> inverseLemmaMap = HashMultimap.create();
Then, as you annotate your corpus using Stanford NLP, you can obtain the lemma and its corresponding token, and populate the above map (or multimap). This way, after a single pass over your dataset, you will have the required inverse lemmatization.
Note that this will be restricted to the corpus/dataset you are using, and not all words in the English language will be included.
Another note is that people often think that an inflection is uniquely determined by the part of speech. This is incorrect:
String s = "My running was beginning to hurt me. I was running all day."
The first instance of running
is tagged NN
, while the second instance is the present continuous tense of the verb, tagged VBG
. This is what I meant by "lossy, one-way function" earlier in my answer.