I am looking for the code of the base R formula parser or interpreter, that translates the formula the user types into the variables and transformations used to bridge the data to the model matrix. A number of packages have their own formula interpreters that supplement or replace the base R interpreter, e.g. rmutil, gamlss.nl, ttBulk.
At a minimum, the following symbols have a distinct meaning in the formula context. I am looking for the code that implements that meaning.
~, 1, 0, +, -, *, /, :, ^, ., |, I, %in%
In addition, the functions below seem to be used mainly within the formula context, but I am not sure if they operate in a distinct way in that context. Some may have meaning only in model-fitting functions beyond lm or from particular packages. In some cases I am not sure that they have a meaning outside of the formula context.
C
||
poly
offset
strata
cluster
contrasts
ns
lo
bs
s
What I really want is an expository piece or tutorial at a level of detail that would let me figure out, e.g. which of the operations above commute, which are distributive over which other ones, which ones have inverse operations. But I gather that no such exposition exists.
I'd also like to get a complete list of functions that mean something different inside a formula, if such can be had. There is nothing in the R Language Definition or in R Internals about these special meanings, and, e.g., methods("|")
gives me methods for hex and octal. The best discussion I have seen is still Statistical Models in S, Chap. 2, sect. 2.3.1, but I believe this is incomplete and maybe also not currant.