6

I have a database of words (including nouns and verbs). Now I would like to generate all the different (inflected) forms of those nouns and verbs. What would be the best strategy to do this?

As Latin is a highly inflected language, there is:

a) the declension of nouns

b) the conjugation of verbs

See this translated page for an example of a verb's conjugation ("mandare"): conjugation

I don't want to type in all those forms for all the words manually.

How can I generate them automatically? What is the best approach?

  • a list of complex rules how to inflect all the words
  • Bayesian methods
  • ...

There's a program called "William Whitaker's Words". It creates inflections for Latin words as well, so it's exactly doing what I want to do.

Wikipedia says that the program works like this:

Words uses a set of rules based on natural pre-, in-, and suffixation, declension, and conjugation to determine the possibility of an entry. As a consequence of this approach of analysing the structure of words, there is no guarantee that these words were ever used in Latin literature or speech, even if the program finds a possible meaning to a given word.

The program's source is also available here. But I don't really understand how this is to work. Can you help me? Maybe this would be the solution to my question ...

TylerH
  • 20,799
  • 66
  • 75
  • 101
caw
  • 30,999
  • 61
  • 181
  • 291

3 Answers3

5

You could do something similar to hunspell dictionary format (see http://www.manpagez.com/man/4/hunspell/)

You define 2 tables. One contains roots of the words (the part that never change), and the other contains modifications for a given class. For a given class, for each declension (or conjugation), it tells what characters to add at the end (or the beginning) of the root. It even can specify to replace a given number of characters. Now, to get a word at a specific declension, you take the root, apply the transformation from the class it belongs, and voilà!

For example, for mandare, the root would be mand, and the class would contains suffixes like o, as, ate, amous, atis... for active indicative present.

Charles Brunet
  • 21,797
  • 24
  • 83
  • 124
  • 1
    Thank you very much for this answer :) The problem is that not all words are following the rules like "mandare". This is a regular verb. But there are lots of irregular verbs like "tollere, tollo, sustuli, sublatum". – caw Apr 08 '11 at 17:00
  • 1
    For such exception, you could define a specific class for each special word, where the transformation of some declension could be to replace the whole root with something. Could could even think about a hierarchy of classes, where the subclass would just record the differences from the parent class (telling its identical to the parent class, except for this and for that). – Charles Brunet Apr 08 '11 at 17:12
2

I'll use as example the nouns, but it applies also to verbs.

First, I would create two classes: Regular and Irregular. For the Regular nouns, I would make three classes for the three declensions, and make them all implement a Declensable (or however the word is in English :) interface (FirstDeclension extends Regular implements Declensable). The interface would define two static enums (NOMINATIVE, VOCATIVE, etc, and SINGULAR, PLURAL). All would have a string for the root and a static hashmap of suffixes. The method FirstDeclension#get (case, number) would then append the right suffix based on the hashmap.

The Irregular class should have to define a local hashmap for each word and then implement the same Declensable interface.

Does it make any sense?

Addendum: To clarify, the constructor of class Regular would be

public Regular (String stem) {
    this.stem = stem
}
Aleadam
  • 40,203
  • 9
  • 86
  • 108
  • Thanks for your answer. It does make sense ;) But what is missing in your answer is that words can have different stems, e.g.: "dominus" and "puer" and "ager" all belong to the o-declension. But their stems are "domin", "puer" and "agr". So you need the stems and can't do anything without them. – caw Apr 11 '11 at 03:08
  • Sorry I called them "roots" instead of "stems". That's my Spanish kicking in... "All would have a string for the root and a static hashmap of suffixes. " In "ager", the stem should be "ag", not "agr", since "ager" does not contain "agr". It would be an Irregular for the sake of this classification. – Aleadam Apr 11 '11 at 03:20
  • BTW, I took some Latin too many years ago to remember every little possibility... I wish I remember a little more. So the hierarchy could become a little more complicated for the "weird" cases, but in the worst case, there's always the possibility of having an Irregular instance with an empty root and the complete word in the hash (although I can't recall any word that would start with a different letter in a particular declense). – Aleadam Apr 11 '11 at 03:23
  • Thanks for the additional comments and your "addendum" :) I did understand every single word your wrote, no problem. "root" instead of "stem" is perfect - it's the same I think. But I thought you want to save the roots for the declensions, not for the single words. – caw Apr 11 '11 at 13:50
0

Perhaps, you could follow the line of AOT in your implementation. (It's under LGPL.)

There's no Latin morphology in AOT, rather only Russian, German, English, where Russian is of course an example of an inflectional morphology as complex as Latin, so AOT should be ready as a framework for implementing it.

Still, I believe one has to have an elaborate precise formal system for the morphology already clearly defined before one goes on to programming. As for Russian, I guess, most of the working morphological computer systems are based on the serious analysis of Russian morphology done by Andrey Zalizniak and in the Grammatical Dictionary of Russian and related works.

imz -- Ivan Zakharyaschev
  • 4,921
  • 6
  • 53
  • 104