Addressing Yousef's answer:
Yes, the compiled grammar is smaller when the variants are at a V
level.
In your alternative grammar, you apply a V -> VP
operation on the V that has the variants. In my grammar, I applied a V -> Imp
operation on the V. In both of these grammars that compile quickly, the category that gets the variants is V
and not VP
.
You are right that there is no reason to avoid the VP
category elsewhere in the grammar—the crucial issue here is whether the grammar has a variant-riddled VP as a 0-argument function.
Anatomy of the grammar blowup
Why is that? I returned to this question after reducing the fields in a VP, so now I can demonstrate this more easily.
We need to look at the PGF dump. You can see it by typing pg
in the GF shell, where you have opened the grammar.
PGF dump for well-behaving grammar
Here are the concrete functions for the original grammar (in Yousef's first question), with the difference of adding a function MkVP : Verb -> VerbPhrase
, and moving all variants into Play_V
.
-- Abstract funs
fun Play_V : Verb ;
fun MkVP : Verb -> VerbPhrase ;
fun MkImp : VerbPhrase -> Imperative ;
-- English concrete syntax compiled into the following
F8 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S54 x 40-},S0,S1,S7,S1,S1,S1,S1,S2,S2,S2,S1,S1,S1,S8,S5,S3,S0,S0,S0,S0,S0,S0,S0,S0,S0,S0) [MkVP]
F9 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S54 x 40-},S0,S1,S7,S1,S1,S1,S1,S2,S2,S2,S1,S1,S1,S8,S5,S3,S21,S23,S52,S53,S20,S19,S18,S43,S43,S43) [MkVP]
F10 := (S4,S4,S6,S6,S14,S14,S15,S15,S16,S16,S17,S17) [MkImp]
F11 := (S10,S13,S11,S12,S11,S0) [Play_V]
F12 := (S24,S27,S25,S26,S25,S0) [Play_V]
F13 := (S31,S34,S32,S33,S32,S0) [Play_V]
F14 := (S35,S38,S36,S37,S36,S0) [Play_V]
F15 := (S44,S47,S45,S46,S45,S0) [Play_V]
F16 := (S48,S50,S49,S51,S49,S0) [Play_V]
F17 := (S39,S42,S40,S41,S40,S0) [Play_V]
F18 := (S28,S29,S28,S30,S28,S22) [Play_V]
- We have 8 concrete functions for Play_V, because we had 8 variants: "broadcast", "play", "replay", "see", "view", "watch", "show" and "put on".
- We have two concrete functions for MkVP, because they are inherited from the GF RGL. This particular split into two concrete functions is due to the
isRefl
param in the lincat of V
:
- A function of type
V -> VP
either takes a V where isRefl=True
, in which case it puts the sequences S21,S23,S52,S53,S20,S19,S18,S43,S43,S43
into appropriate fields. (If you follow the sequence numbers in the PGF dump, you will see they all correspond to reflexive pronouns. S21 is "myself", S43 is "themselves".)
- Or it takes a V where
isRefl=False
, in which case it puts S0
into all those fields. (S0 is the empty string.)
PGF dump for the misbehaving grammar
Now, let us look at the PGF for the grammar that has a 0-argument function VP riddled with variants.
F4 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S15 x 40-},S0,S8,S9,S8,S8,S8,S8,S11,S11,S11,S8,S8,S8,S0,S10,S9,S0,S0,S0,S0,S0,S0,S0,S0,S0,S0) [Play_VP]
…
F16388 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S15 x 40-},S0,S12,S12,S12,S12,S12,S12,S13,S13,S13,S12,S12,S12,S7,S14,S12,S0,S0,S0,S0,S0,S0,S0,S0,S0,S0) [Play_VP]
This time, we don't have the split into "what if the argument is reflexive or not", because the Play_VP doesn't take arguments. Instead, we split into over 16000 concrete functions, due to the variants blowing up.
To see the process in a smaller scale, see my blog post: https://inariksit.github.io/gf/2018/06/13/pmcfg.html#variants
The key there is the following: we only introduce 4 variants in a linearisation of a single function—the variants don't come from the arguments, but are introduced directly into the function. Each of these variants is used multiple times in the linearisation, so that blows up into 64 new concrete functions.
Now for a function that returns a VP, its arguments are used in many more places. The lincat of V
has only 6 fields, and VP
has almost 100, even after my latest fix. This means that the same fields from the V
argument are reused multiple times, and whenever that happens, it splits exponentially into 8 new branches of concrete functions.
Solutions
To recap:
- Keep the variants in a category that has a small lincat; V instead of VP in this case.
- No need to avoid large categories elsewhere in the grammar; if a function
f : SmallCat -> BigCat
takes an argument that is full of variants, it will go just fine. The function f
will not blow up—it doesn't care about its potential arguments on the level of variants, only on the level of inherent parameters (like MkVP
is interested if its argument V
is reflexive, but doesn't care if it is composed of 8 variants).
Future
The overall handling of variants is going to change in GF 4.0. So whenever it is released, this whole answer is hopefully deprecated, and we have a glorious future where nobody runs into these problems anymore.