9

When we fit a statistical model in R, say

lm(y ~ x, data=dat)

We use R's special formula syntax: "y~x"

Is there something that converts from such a formula to the corresponding equation? In this case it could be written as:

y = B0 + B1*x

This would be very useful! For one, because with more complicated formulae I don't trust my translation. Second, in scientific papers written with R/Sweave/knitr, sometimes the model should be reported in equation form and for fully reproducible research, we'd like to do this in automated fashion.

Alex Holcombe
  • 2,453
  • 4
  • 24
  • 34
  • This question appears to be off-topic because it is about statistics – Metrics Oct 13 '13 at 00:30
  • It's about how the programming language R handles formulae, so I thought it was a programming question, but do you think it's better for CrossValidated? – Alex Holcombe Oct 13 '13 at 01:00
  • 1
    possible duplicate of [ggplot2: Adding Regression Line Equation and R2 on graph](http://stackoverflow.com/questions/7549694/ggplot2-adding-regression-line-equation-and-r2-on-graph) – nograpes Oct 13 '13 at 01:07
  • Not a duplicate because that question only pertains to the specific, simplest linear formula. My question is about R formulae in general, including more complicated ones. – Alex Holcombe Oct 13 '13 at 01:18
  • There may be a way to write a function that transforms `my.model <- lm(y ~ x); model.matrix(my.model)` into what you want. But I do not know whether such a function already exists. – Mark Miller Oct 13 '13 at 01:22
  • 3
    You could `paste()` together the names and values of `coef(m)`, where `m` is your fitted model, using `sep=*` and `collapse = " + "`. You can grab the name of the response variable from some piece of `terms(m)`. There will still be lots of fiddly little bits, like changing any occurrence of `"+ -"` to `"- "`, and removing the textual `"(Intercept)"` from the printed result. I'd guess somebody has done it before, though I don't know who! – Josh O'Brien Oct 13 '13 at 01:46
  • [You might want to check here. It seems the question is very similar][1] [1]: http://stackoverflow.com/questions/5774813/short-formula-call-for-many-variables-when-building-a-model – Tay Shin Nov 06 '13 at 03:10

1 Answers1

5

Just had a quick play and got this working:

# define a function to take a linear regression
#  (anything that supports coef() and terms() should work)
expr.from.lm <- function (fit) {
  # the terms we're interested in
  con <- names(coef(fit))
  # current expression (built from the inside out)
  expr <- quote(epsilon)
  # prepend expressions, working from the last symbol backwards
  for (i in length(con):1) {
    if (con[[i]] == '(Intercept)')
        expr <- bquote(beta[.(i-1)] + .(expr))
    else
        expr <- bquote(beta[.(i-1)] * .(as.symbol(con[[i]])) + .(expr))
  }
  # add in response
  expr <- bquote(.(terms(fit)[[2]]) == .(expr))
  # convert to expression (for easy plotting)
  as.expression(expr)
}

# generate and fit dummy data
df <- data.frame(iq=rnorm(10), sex=runif(10) < 0.5, weight=rnorm(10), height=rnorm(10))
f <- lm(iq ~ sex + weight + height, df)
# plot with our expression as the title
plot(resid(f), main=expr.from.lm(f))

Seems to have lots of freedom about what variables are called, and whether you actually want the coefficients in there as well—but seems good for a start.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60