What is the symbol ~ for in R?

Question

I saw this symbol sometimes especially in Lattice and ggplot2. Seemingly it is for relating two variables to represent a relation. Is it only in the two graphic package or defined in R as such? What is its explanation?

e.g.

cars <- read.csv("cars.csv", row.names=1)
library(lattice)
xyplot(Price ~ Weight, data=cars)
histogram( ~ Weight, data=cars)

To get help on symbols in R, use `?'~'` at the command prompt. — Tyler, Apr 09 '14 at 16:57
it is used many ways outside of graphics ... e.g. `lm(Price ~ Weight)`, definitely read the manual — user1317221_G, Apr 09 '14 at 16:58
oh thanks! I didn't know I need to use quotation marks on operators — CyberPlayerOne, Apr 09 '14 at 17:03
Why is this getting downvoted? It's a perfectly reasonable question. A bit basic, but perfectly reasonable. — jlhoward, Apr 09 '14 at 18:06
Suggestion: do a search for `[r] tilde` in the SO search box. This will take you to many similar questions with great answers. — Andrie, Apr 09 '14 at 18:26
@jlhoward One of the reasons for down voting is "lack of research effort". What counts as being so basic that "reasonable" effort would have answered it is pretty subjective, though, so you get different judgements on any given question. Some people probably felt that this would have ben easily answered with some basic research. — joran, Apr 09 '14 at 18:38
@joran I take your point, but IMO the greatest deficiency in R, by far, is the documentation. So when someone says "I can't find this in the documentation", or, "I can't understand the documentation", I am generally *very* sympathetic. — jlhoward, Apr 09 '14 at 18:49
@jlhoward When I google "what is tilde in R" the first two results are very good SO questions that explain this (one is the duplicate). I consider that pretty basic research. — joran, Apr 09 '14 at 18:52

score 2 · Accepted Answer · answered Apr 09 '14 at 18:29

R supports a special data type called "formula", which has the general form

LHS ~ RHS

although LHS is not always required. There are rules for how to specify the LHS and RHS and what they mean (see ?formula).

The interpretation of a formula depends on the function call, so you need to read the documentation for the specific call. For example, in

aggregate(mpg~cyl,mtcars,mean)
#   cyl      mpg
# 1   4 26.66364
# 2   6 19.74286
# 3   8 15.10000

the formula means "group mpg by cyl in mtcars and calculate the mean for each group".

On the other hand, when used in lm(...)

fit <- lm(mpg~wt+hp+disp,mtcars)
summary(fit)
# ...
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 37.105505   2.110815  17.579  < 2e-16 ***
# wt          -3.800891   1.066191  -3.565  0.00133 ** 
# hp          -0.031157   0.011436  -2.724  0.01097 *  
# disp        -0.000937   0.010350  -0.091  0.92851    
# ---
# ...

means "fit a linear model mpg = b0 + b1*wt + b2*hp + b3*disp". Note that you don't specify the b's.

In xyplot(...)

library(lattice)
xyplot(mpg~wt,mtcars)

the formula means "plot mgp vs wt in mtcars".

Finally, you can set a variable to a formula, as in

myFormula <- mpg~hp+wt+disp
fit <- lm(myFormula,mtcars)

What is the symbol ~ for in R?

1 Answers1