5

I am working on a custom function that includes a call to lm(), but for some reason the function fails. I can't make any sense of why it fails.

Consider this example simplified to the bare-bones:

myfun <- function(form., data., subs., ...){
    lm(form., data., subs., ...)
}

This will end up in an error:

myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'subs.' not found

However using lm() directly will work just fine:

lm(mpg ~ cyl + hp, mtcars, TRUE)
## 
## Call:
## lm(formula = mpg ~ cyl + hp, data = mtcars, subset = TRUE)
## 
## Coefficients:
## (Intercept)          cyl           hp  
##    36.90833     -2.26469     -0.01912  

I tried debugging, but still can't get to the bottom of the problem. Why does the custom function fail? Clearly subs. has been supplied to the function...


Edit:

While most of the solutions suggested below help in this simple case, the function will still fail if I add a simple twist. For instance expand.model.frame() relies on the formula's environment, but fails if I use the normal evaluation solution:

myfun <- function(form., data., subs., ...){
    fit <- lm(form., data.[ subs., ], ...)
    expand.model.frame(fit, ~ drat)
}

myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'data.' not found

This is obviously related to the original issue, but I can't figure how. Is the environment of the model formula somehow corrupted?

Community
  • 1
  • 1
landroni
  • 2,902
  • 1
  • 32
  • 39
  • 2
    A short answer is probably to not single out the `subset` argument. Just pass everything in a named fashion, via `...`. – joran May 19 '16 at 15:07
  • @joran Yes, this could be done, but I'm dealing with a number of different functions in my custom fun, and passing it all through `...` may not be a roust solution. I may end up passing some arguments to the wrong function. I still don't understand why the function in the question would fail... – landroni May 19 '16 at 15:10
  • 1
    It's complicated, and I'm struggling this early in the morning (for me) to explain it clearly. Basically it boils down to the rather large amount of special evaluation taking place in `lm`. – joran May 19 '16 at 15:14
  • @joran Who said anything about R not being quirky... So what is the kosher way around this "special evaluation"? Using `...` or doing `match.call()` and then match desired arguments followed by a `do.call()` (basically the work-flow done in `lm`)? – landroni May 19 '16 at 15:18
  • Another oddity - if you assign `subs. <- TRUE` globally first, then the function will work. – Señor O May 19 '16 at 15:28
  • @joran Could you please check the edited question? Some of the proposed solutions don't seem to help if there is a subsequent e.g. `expand.data.frame()` call which relies on the environment of the model formula... Can't quite understand what is still going wrong... – landroni May 19 '16 at 16:27
  • Roland already answered this in the most global sense: stop using special evaluation. Pass the formula as a character and coerce to formula when you call `lm`. Note that this idea is already suggested in an existing answer. – joran May 19 '16 at 17:08
  • @joran I may be wrong, but my understanding is that Roland's suggestion was to avoid using the `subset` argument altogether. The "pass character and coerce to formula" approach still relies on the special evaluation going on in `lm()` for the `subset` argument. And this latter approach will indeed work in the `expand.model.frame()` case, but again I'm not sure why... In any case, thank you for all the suggestions. – landroni May 19 '16 at 17:48
  • 1
    Roland's very first contribution was: "The clean solution is to use standard evaluation". The difference is only a matter of degree. I'm not sure how many different ways all these people can explain that formulas have attached environments, and that where the formula is created influences where things are looked for, and hence formulas created at the top level will be attached to the global environment. And the global environment doesn't know about the arguments to `myfun`. – joran May 19 '16 at 17:52
  • @joran Thank you, I think I'm starting to get the hang of it. I've now edited my answer with what seems to me a more robust approach. – landroni May 19 '16 at 22:17

4 Answers4

5

As suggested in the comments, another solution would be to avoid the subset argument altogether in non-interactive use, and use standard evaluation instead:

myfun <- function(form., data., subs., ...){
    lm(form., data.[ subs., ], ...)
}

Now this works as expected:

myfun(formula(mpg ~ cyl + hp), mtcars, TRUE)

However this won't still be enough if your custom function subsequently contains calls like expand.model.frame() or similar, which seem to be themselves sensitive to the non-standard evaluation of the subset argument. To make the function robust and avoid surprises, you need to both (1) define the formula within the custom function (see also the reformulate approach) and (2) subset the data prior to the lm() call while conspicuously avoiding the subset argument.

Like this:

myfun <- function(form., data., subs., ...){
    stopifnot(is.character(form.))
    data. <- data.[ subs., ]
    fit <- lm(as.formula(form.), data., ...)
    expand.model.frame(fit, ~ drat)
}

myfun("mpg ~ cyl + hp", mtcars, TRUE)

I tried using either (1) or (2), but still managed to run into strange errors from some functions, and it's only with both (1) and (2) that the errors seem to have gone away...

Community
  • 1
  • 1
landroni
  • 2,902
  • 1
  • 32
  • 39
4

The reason this function doesn't work is because of the way the argument subset is evaluated:

All of ‘weights’, ‘subset’ and ‘offset’ are evaluated in the same way as variables in ‘formula’, that is first in ‘data’ and then in the environment of ‘formula’.

In other words, lm looks for a variable named subs. in data and then in the environment of formula, and since there is no subs. variable in either of those environments it produces an error.

Ernest A
  • 7,526
  • 8
  • 34
  • 40
  • What is the "correct" way to work around this special evaluation, besides using only `...` in the function definition? – landroni May 19 '16 at 15:19
  • There is no clean solution. The function `myfun` has to add `subs.` to `data` and/or to the environment of formula, but this will overwrite any variables with the same name already present in these environments. – Ernest A May 19 '16 at 15:26
  • @landroni See my new comment (and change of heart) on the other answer. – joran May 19 '16 at 15:30
  • 2
    The clean solution is to use standard evaluation, i.e., avoid `lm`'s `subset` argument in non-interactive use. – Roland May 19 '16 at 15:31
  • @Roland Do you mean that I should avoid the `subset` argument altogether, and subset the `data.frame` directly using `mtcars[ subs., ]`? – landroni May 19 '16 at 15:34
  • 2
    @landroni Since you write the wrapper function anyway: yes. – Roland May 19 '16 at 15:35
  • @Roland This solution doesn't seem to help if there is a subsequent e.g. `expand.data.frame()` call which relies on the environment of the model formula... Could you please check the edited question? – landroni May 19 '16 at 16:24
3

You can do something like this:

myfun <- function(form., data., subs., ...){
    lm(as.formula(form.), data., subs., ...)
}

Call it as myfun("mpg ~ cyl + hp", mtcars, T). This forces the formula to be created in the environment of the function myfun which will then contain subs..

joran
  • 169,992
  • 32
  • 429
  • 468
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • @joran Maybe - maybe not. The proposed code does actually fix the problem that is popping up but I agree it isn't ideal. – Dason May 19 '16 at 15:11
  • 1
    I've changed my mind about this answer. Passing the formula as a character and then coercing it when calling `lm` forces the environment of the formula to be the enclosing env of the `lm` call, which will contain the arguments `form.`, `data.` and `subs.`, so it will be evaluated correctly. This seems like a reasonable solution to me. – joran May 19 '16 at 15:29
  • 1
    ...also note that all of this is actually taking place in `model.frame.default`. – joran May 19 '16 at 15:32
3

Building on the answer of @ErnestA you can modify your function to ensure that subs. is present in the environment of formula form.:

myfun <- function(form., data., subs., ...){
assign("subs.", subs., envir=environment(form.))
lm(form., data., subs., ...)
}

ETA to avoid contaminating the environment of form you can create a new environment thus:

myfun <- function(form., data., subs., ...){
environment(form.) <- new.env(parent=environment(form.))
assign("subs.", subs., envir=environment(form.))
lm(form., data., subs., ...)
}

ETA perhaps the neatest way of fixing the lm issue alone is to set the environment of form. to that of myfun:

myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
lm(form., data., subs., ...)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Call:
##   lm(formula = form., data = data., subset = subs.)
## 
## Coefficients:
##   (Intercept)          cyl           hp  
##      36.90833     -2.26469     -0.01912  

Turning to the expand.model.frame issue, subs. is not found although it's in the environment which ?expand.model.frame says is used. Is this a bug in expand.model.frame? or at least a conflict with the documentation?

myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
fit <- lm(form., data., subs., ...)
print(ls(environment(formula(fit))))
expand.model.frame(fit, ~drat )
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## [1] "data." "fit"   "form." "subs."
##  Error in eval(expr, envir, enclos) : object 'subs.' not found

Putting subs. into the parent environment seems to work.

myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
fit <- lm(form., data., subs., ...)
assign("subs.", subs., envir = parent.env(environment(formula(fit))))
expand.model.frame(fit, ~drat)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## mpg cyl  hp drat
## Mazda RX4           21.0   6 110 3.90
## Mazda RX4 Wag       21.0   6 110 3.90
## Datsun 710          22.8   4  93 3.85
## Hornet 4 Drive      21.4   6 110 3.08
## etc.

But this has the issues of contaminating the parent environment, in this case R_GlobalEnv. I haven't been able to make it work using anything other than R_GlobalEnv as the parent.

user20637
  • 664
  • 3
  • 11
  • I'm not sure I follow. What could be contaminated in the first case? – landroni May 19 '16 at 16:30
  • 1
    In a comment on his answer @Ernest A correctly said "The function myfun has to add subs. to data and/or to the environment of formula, but this will overwrite any variables with the same name already present in these environments". By default the environment of `form.` is that in which it is created; `R_GlobalEnv` if your example call to `myfun` is from the command prompt. To avoid creating/overwriting `subs.` in that environment I make the environment of `form.` a new, empty environment in which I can create `subs.`. This doesn't seem to fix your `expand.model.frame` issue :-( – user20637 May 20 '16 at 08:08