R supports a special data type called "formula", which has the general form
LHS ~ RHS
although LHS is not always required. There are rules for how to specify the LHS and RHS and what they mean (see ?formula
).
The interpretation of a formula depends on the function call, so you need to read the documentation for the specific call. For example, in
aggregate(mpg~cyl,mtcars,mean)
# cyl mpg
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
the formula means "group mpg by cyl in mtcars and calculate the mean for each group".
On the other hand, when used in lm(...)
fit <- lm(mpg~wt+hp+disp,mtcars)
summary(fit)
# ...
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 37.105505 2.110815 17.579 < 2e-16 ***
# wt -3.800891 1.066191 -3.565 0.00133 **
# hp -0.031157 0.011436 -2.724 0.01097 *
# disp -0.000937 0.010350 -0.091 0.92851
# ---
# ...
means "fit a linear model mpg = b0 + b1*wt + b2*hp + b3*disp". Note that you don't specify the b's.
In xyplot(...)
library(lattice)
xyplot(mpg~wt,mtcars)
the formula means "plot mgp vs wt in mtcars".
Finally, you can set a variable to a formula, as in
myFormula <- mpg~hp+wt+disp
fit <- lm(myFormula,mtcars)