0

I saw this symbol sometimes especially in Lattice and ggplot2. Seemingly it is for relating two variables to represent a relation. Is it only in the two graphic package or defined in R as such? What is its explanation?

e.g.

cars <- read.csv("cars.csv", row.names=1)
library(lattice)
xyplot(Price ~ Weight, data=cars)
histogram( ~ Weight, data=cars)
CyberPlayerOne
  • 3,078
  • 5
  • 30
  • 51
  • 6
    To get help on symbols in R, use `?'~'` at the command prompt. – Tyler Apr 09 '14 at 16:57
  • 1
    it is used many ways outside of graphics ... e.g. `lm(Price ~ Weight)`, definitely read the manual – user1317221_G Apr 09 '14 at 16:58
  • oh thanks! I didn't know I need to use quotation marks on operators – CyberPlayerOne Apr 09 '14 at 17:03
  • 4
    two tylers = confusing – user1317221_G Apr 09 '14 at 17:04
  • Why is this getting downvoted? It's a perfectly reasonable question. A bit basic, but perfectly reasonable. – jlhoward Apr 09 '14 at 18:06
  • Suggestion: do a search for `[r] tilde` in the SO search box. This will take you to many similar questions with great answers. – Andrie Apr 09 '14 at 18:26
  • 2
    @jlhoward One of the reasons for down voting is "lack of research effort". What counts as being so basic that "reasonable" effort would have answered it is pretty subjective, though, so you get different judgements on any given question. Some people probably felt that this would have ben easily answered with some basic research. – joran Apr 09 '14 at 18:38
  • @joran I take your point, but IMO the greatest deficiency in R, by far, is the documentation. So when someone says "I can't find this in the documentation", or, "I can't understand the documentation", I am generally *very* sympathetic. – jlhoward Apr 09 '14 at 18:49
  • @jlhoward When I google "what is tilde in R" the first two results are very good SO questions that explain this (one is the duplicate). I consider that pretty basic research. – joran Apr 09 '14 at 18:52

1 Answers1

2

R supports a special data type called "formula", which has the general form

LHS ~ RHS

although LHS is not always required. There are rules for how to specify the LHS and RHS and what they mean (see ?formula).

The interpretation of a formula depends on the function call, so you need to read the documentation for the specific call. For example, in

aggregate(mpg~cyl,mtcars,mean)
#   cyl      mpg
# 1   4 26.66364
# 2   6 19.74286
# 3   8 15.10000

the formula means "group mpg by cyl in mtcars and calculate the mean for each group".

On the other hand, when used in lm(...)

fit <- lm(mpg~wt+hp+disp,mtcars)
summary(fit)
# ...
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 37.105505   2.110815  17.579  < 2e-16 ***
# wt          -3.800891   1.066191  -3.565  0.00133 ** 
# hp          -0.031157   0.011436  -2.724  0.01097 *  
# disp        -0.000937   0.010350  -0.091  0.92851    
# ---
# ...

means "fit a linear model mpg = b0 + b1*wt + b2*hp + b3*disp". Note that you don't specify the b's.

In xyplot(...)

library(lattice)
xyplot(mpg~wt,mtcars)

the formula means "plot mgp vs wt in mtcars".

Finally, you can set a variable to a formula, as in

myFormula <- mpg~hp+wt+disp
fit <- lm(myFormula,mtcars)
jlhoward
  • 58,004
  • 7
  • 97
  • 140