10

I often specify the formula argument to model fitting functions like lm or lme by pasting together the parts I need, as in @DWin's answer to this question: Understanding lm and environment.

In practice this looks like this:

library(nlme)
set.seed(5)
ns <- 5; ni <- 5; N <- ns*ni
d <- data.frame(y=rnorm(N),
                x1=rnorm(N),
                x2=factor(rep(1:ni, each=ns)),
                id=factor(rep(1:ns, ni)))

getm <- function(xs) {
  f <- paste("y ~", paste(xs, collapse="+"))
  lme(as.formula(f), random=~1|id, data=d, method="ML")
}
m1 <- getm("x1")
m2 <- getm(c("x1", "x2"))

However, with lme from the nlme package, comparing two models constructed in the way using anova doesn't work, because anova.lme looks at the saved formula argument to ensure that the models were fit on the same response, and the saved formula argument is simply as.formula(f). The error is:

> anova(m1, m2)
Error in inherits(object, "formula") : object 'f' not found

Here's what the anova command should do (refitting the models so that it works):

> m1 <- lme(y~x1, random=~1|id, data=d, method="ML")
> m2 <- lme(y~x1+x2, random=~1|id, data=d, method="ML")
> anova(m1, m2)
   Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m1     1  4 76.83117 81.70667 -34.41558                        
m2     2  8 72.69195 82.44295 -28.34597 1 vs 2 12.13922  0.0163

Any suggestions?

Community
  • 1
  • 1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142

2 Answers2

15

Ben's answer works, but do.call provides the more general solution he wished for.

getm <- function(xs) {
    f <- as.formula(paste("y ~", paste(xs, collapse="+")))
    do.call("lme", args = list(f, random=~1|id, data=d, method="ML"))
}

It works because (by default) the arguments in args = are evaluated before being passed to lme.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 4
    yes, but ... (and this is not the question the OP asked) ... I'm really most curious about how `lme` (or some other independent function) should behave so that it can make use of formulas that have been constructed in other (no longer directly accessible) environments – Ben Bolker Oct 06 '11 at 12:24
  • Sounds like an interesting problem. Could you put it up as a question in its own right, with some code illustrating the situation? – Josh O'Brien Oct 06 '11 at 13:56
  • @BenBolker: I want to know the answer to your question too. Any success in formulating it well? – Aaron left Stack Overflow Oct 31 '11 at 16:55
  • 2
    Using `data=as.name("d")` instead of `data=d` will store the call as `data=d` instead of storing a representation of the data frame; this is useful for both object size and reuse using `update`. – Aaron left Stack Overflow Mar 26 '12 at 16:07
  • Here's [another question](http://stackoverflow.com/q/9161273/210673) that was solved using this method. – Aaron left Stack Overflow Apr 23 '12 at 14:35
  • And [another one](http://stackoverflow.com/a/14940094/210673). Can you tell I found this answer super helpful? Thanks again, @JoshO'Brien! – Aaron left Stack Overflow Feb 18 '13 at 15:52
  • @Aaron -- Yeah, `do.call()` is a real power-tool isn't it? What's funny to me is that this here was my first post on SO and folk(s) are still finding it helpful. For yet another interesting application of `do.call()`, from just a couple of days ago, [see here](http://stackoverflow.com/questions/14837902/how-to-write-a-function-that-calls-a-function-that-calls-data-table/14838797#14838797). – Josh O'Brien Feb 18 '13 at 16:03
  • 1
    @Aaron this is a fantastic addendum and really helpful for my situation. I noticed that there is still a significant slow-down for me when using `do.call()` which led me to [this answer](http://stackoverflow.com/a/13924631/2352940) -- using `quote=T` results in quicker execution, and keeps the formula as is. – jflournoy Jun 30 '14 at 21:49
  • @Aaron Actually, I ran into a problem with `as.name('d')` when using `do.call()` inside `llply()` such that I couldn't get `do.call()` to see `d`. Maybe I'll post a follow up question about this. – jflournoy Jul 01 '14 at 00:17
  • @jflournoy: thanks for the updates; I'll keep an eye out for it. – Aaron left Stack Overflow Jul 01 '14 at 02:25
  • This solution helped in a related question :http://stackoverflow.com/a/35864123/1199289 – Marc in the box Mar 08 '16 at 12:56
4

Here's a hack that seems to work:

getm <- function(xs) {
  f <- paste("y ~", paste(xs, collapse="+"))
  m <- lme(as.formula(f), random=~1|id, data=d, method="ML")
  m$call$fixed <- eval(m$call$fixed)
  m
}

but I don't like it at all. I would very much like to see a more principled answer to this question, because I run into this sort of problem all the time when trying to extend the bbmle package.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 2
    Maybe the 'real' solution is for `lme` to check how the formula was passed and make sure it's stored correctly using `eval` exactly as you've done? – joran Oct 05 '11 at 21:18
  • 1
    maybe, but I'm not sure I know how to do that The Right Way. I don't know the correct incantation (nor even the logic, which is the real problem) to evaluate the formula in the correct environment when it may have been passed down through a chain of several calling functions ... – Ben Bolker Oct 05 '11 at 21:22
  • 2
    @Ben -- Based on [my question and answer here](http://stackoverflow.com/questions/17395724/any-pitfalls-to-using-programatically-constructed-formulas/17458699?noredirect=1#comment25405098_17458699) and Aaron's helpful pointer back to here, I think I may have stumbled on "The Right Way". Replacing the body of `nlme:::formula.lme` (currently `eval(x$call$fixed)`) with `formula(terms(x))` seems like the better way to do this, and does fix the scoping issues that occasioned the OP's question. (I tested using `fixInNamespace("formula.lme", "nlme")` after which `anova(m1, m2)` 'just worked'. – Josh O'Brien Jul 05 '13 at 21:45