How to get a sort of inverse lemmatizations for every language?

Question

I found the spacy lib that allows me to apply lemmatization to words (blacks -> black, EN) (bianchi -> bianco, IT). My work is to analyze entities, not verbs or adjectives.

I'm looking for something that allows me to have all the possible words starting from the caninical form.

Like from "black" to "blacks", for english, or from "bianco" (in italian) and get "bianca", "bianchi", "bianche", etc. Is there any library that do this?

Does this answer your question? [How to inverse lemmatization process given a lemma and a token?](https://stackoverflow.com/questions/45590278/how-to-inverse-lemmatization-process-given-a-lemma-and-a-token) — polm23, May 19 '20 at 07:36

bivouac0 · Answer 1 · 2020-05-18T14:22:30.837

I'm not clear on exactly what you're looking for but if a list of English lemma is all you need you can extract that easily enough from a GitHub library I have. Take a look at lemminflect. Initially, this uses a dictionary approach to lemmatization and there is a .csv file in here with all the different lemmas and their inflections. The file is LemmInflect/lemminflect/resources/infl_lu.csv.gz. You'll have to extract the lemmas from it. Something like...

with gzip.open('LemmInflect/lemminflect/resources/infl_lu.csv.gz)` as f:
    for line in f.readlines():
        parts = lines.split(',')
        lemma = parts[0]
        pos = parts[1]
        print(lemma, pos)

Alternatively, if you need a system to inflect words, this is what Lemminflect is designed to do. You can use it as a stand-alone library or as an extension to SpaCy. There's examples on how to use it in the README.md or in the ReadTheDocs documentation.

I should note that this is for English only. I haven't seen a lot of code for inflecting words and you may have some difficulty finding this for other languages.

How to get a sort of inverse lemmatizations for every language?

1 Answers1