I am trying to generate random practice phrases in English for a morse code trainer. I am trying to figure out how to deal with gender agreement in English. I'd like to be able to generate phrases like "He is a son", "She is a mother", "It is a door", but avoid things like "He is a mother", "She is a door", "It is a father". "He is a mother" mixes genders, and sentences like "She is a door" and "It is a father" mix human/non-human. It seems that in the rgl, human and nonhuman have the Gender
type.
There are times when that sort of thing is acceptable, such as the phrase "No man is an island". And, for some reason, gender reveal parties often use phrases like "Its a boy!". But, I'm just trying to generate training data, so I am trying to focus on common usage.
I am very new to grammatical framework, so I could be approaching this entirely wrong. Here is what I have so far,
In Agreement.gf
abstract Agreement = {
flags startcat = Message ;
cat
Message ; Subject ; SubjectComplement ;
fun
Is : Subject -> SubjectComplement -> Message ;
He, She, It : Subject;
Son, Daughter, Father, Mother, Fence, Door : SubjectComplement;
}
In AgreementEng.gf
concrete AgreementEng of Agreement = open DictEng, SyntaxEng, ParadigmsEng, VerbEng, ResEng in {
lincat
Message = Cl ;
Subject = NP;
SubjectComplement = CN;
lin
Is s sc = mkCl s sc;
He = DictEng.he_Pron;
She = DictEng.she_Pron;
It = DictEng.it_Pron;
Son = mkCN son_N;
Daughter = mkCN daughter_N;
Mother = mkCN mother_N;
Father = mkCN father_N;
Fence = mkCN fence_N;
Door = mkCN fence_N;
}
If I load this into gf
and run generate_random | linearize
, it works, but ignores gender and humanness.
I see that in DictEng
there is some gender/nonhuman markers for the pronouns,
lin she_Pron = mkPron "she" "her" "her" "hers" singular P3 feminine ;
lin he_Pron = mkPron "he" "him" "his" "his" singular P3 masculine ;
lin it_Pron = mkPron "it" "it" "its" "its" singular P3 nonhuman;
Though not for most nouns,
lin mother_N = mkN "mother" "mothers";
lin daughter_N = mkN "daughter" "daughters";
Though some do have gender marked,
lin actor_N = mkN masculine (mkN "actor" "actors");
lin actress_N = mkN feminine (mkN "actress" "actresses");
How would you approach this?
I am open to suggestions for any aspects of this code -- not just the gender issue. My overall goal is to generate increasingly complex, vaguely sensical english phrases. Think Duo Lingo -- but for morse code. I will have a bunch of training levels which build on top of previous levels adding new vocabulary, longer sentences, etc.
At the moment, I do not care about non-English languages -- that is a problem for future me. I also do not need to support for everything in DictEng
. The list of potential words and phrases will be hand curated.
Using what is shown so far, I'd start by training on individual words, "he", "she", "it", "is", "son", etc.
Then simple phrases "he is", "she is", "it is".
Then finally full sentences like "he is a son".
Then I would add the plurals, "we", "they", "are", "sons", etc. Then I'd train the new words individually. Then phrases like "we are", "they are", etc. And then sentences "we are fathers". And then I'd do a mixture of singular and plural sentences.
So, in the grammar files I need the granularity to generate each of these different types of training phrases.
Thanks!
(Not sure it matters but I have decades of Haskell experience and dabble in things like Idris. So I think I am fine with the grammatical framework language -- my trouble is more in understanding the libraries (rgl) and big picture).