7

I'm writing a function that requires a weighted regression. I've repeatedly been getting an error with the weights parameter, and I've created a minimal reproducible example you can find here:

wt_reg <- function(form, data, wts) {
  lm(formula = as.formula(form), data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

This returns

Error in eval(extras, data, env) : object 'wts' not found

If you run this all separately, it works fine. I've dug into lm, and it appears the issue is a call to eval(mf, parent.frame()). Even though wts is in the parent.frame(), it doesn't appear to be evaluated correctly within the call. Here's a little more detail:

mf is assigned such that it's the same as

stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE)

When I run

parent.frame()$wts

it does return a numeric vector. But when I run

eval(stats::model.frame(formula = as.formula(form), data = data, weights = wts, 
    drop.unused.levels = TRUE), parent.frame()) 

it doesn't.

I can run

stats::model.frame(formula = as.formula(parent.frame()$form), 
    data = parent.frame()$data, weights = parent.frame()$wts, 
    drop.unused.levels = TRUE)

and it works. You can test this yourself if you want using the example from the top.

Any thoughts? I really have no idea what's going on here...

be_green
  • 708
  • 3
  • 12

1 Answers1

7

Formulas as special in R in that they not only keep track of symbol/variable names, they also keep track of the environment where they were created. Check out

ff <- mpg ~ cyl
environment(ff)
# <environment: R_GlobalEnv>
foo <- function() {
  ff <- mpg ~ cyl
  environment(ff)
}
foo()
# <environment: 0x0000026172e505d8> private function environment (different each time)

The problem is that lm will try to use the data.frame you pass in and the environment where the formula was created to look up variables rather than the parent frame. Since you create the formula in the call to wt_reg, the formula holds on the the global scope. But wts only exists in the function scope. You can alter your function to change the environment on the formula to the local function environment then everything should work

wt_reg <- function(form, data, wts) {
  ff <- as.formula(form)
  environment(ff) <- environment()
  lm(formula = ff, data = data,
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))

The eval(mf, parent.frame) you are referring to in lm() is calling model.frame() with your formula. And from the description on the ?model.frame help page: "All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame". So it again is looking in the environment of the formula, not the calling frame.

Alternatively, you could move the weights into the data object you are passing to lm itself. This would work

wt_reg <- function(form, data, wts) {
  lm(form, data = cbind(data, wts=wts),
     weights = wts)
}

wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • So then the call to `parent.frame()` in `eval(mf, parent.frame)` doesn't do what I think it's doing? – be_green Apr 11 '20 at 22:50
  • This still isn't working for me. See: ``` wt_reg <- function(form, data, wts) { form = as.formula(form) lm(formula = form, data = data, weights = wts) } wt_reg(mpg ~ cyl, data = mtcars, wts = 1:nrow(mtcars)) ``` – be_green Apr 11 '20 at 22:51
  • I'll also note that it's able to find the `data` and `form` arguments just fine! If I do this kind of thing with no weight argument it works. – be_green Apr 11 '20 at 22:53
  • What exactly do you mean by "still isn't working". Your code in the comment doesn't reassign the environment so it doesn't adrees the problem. `as.formula` will not change the environment for things that are already formulas. I added a bit about why `eval(mf, parent.frame())` doesn't do what you think it does. It works fine without the weights because all those objects are in the global environment. – MrFlick Apr 11 '20 at 22:57
  • Oh no, your version is working! I still don't understand why. – be_green Apr 11 '20 at 22:58
  • It struck me that I shouldn't need to explicitly call environment if I create the formula outside the function. Or that the data argument should break. – be_green Apr 11 '20 at 22:59
  • The thing that I just can't understand is why `ff <- as.formula(form , env = environment())` doesn't work. – Ian Campbell Apr 11 '20 at 23:14
  • Because `as.formula` is meant to coerce things that aren't already formulas. If you pass in something that is already a formula, it won't change it. That code would work if you passed in your formula as a string for example: `wt_reg("mpg ~ cyl", ...)`. Then the coercion to formula would use that environment. – MrFlick Apr 11 '20 at 23:16
  • Ah, I suppose I could have just looked at the source and figured that out. Thanks. – Ian Campbell Apr 11 '20 at 23:17